Implementation of Search in Enterprise Applications

Step 1: Learn about Text Search
Learning fundamentals are always important before dealing with tools/software.

Text Retrieval and Search Engines
by ChengXiang Zhai

Thanks to University of Illionois and Mr. ChengXiang Zhai.

text_retrieval_system

Text Retrieval System:
tr_architecture

Copy rights on above diagrams belongs to “University of Illionois”

Please go through the course to understand all above.

Step 2: Learn Lucene
Lucene was developed in Java by Dough Cutting in 1999. This encapsulates many algorithms learned above.
Lucene is Mr.Dough’s wife’s middle name.

Reference:
Features: https://lucene.apache.org/core/
Lucene Tutorial: http://www.lucenetutorial.com/
Lucene Concepts: http://www.lucenetutorial.com/basic-concepts.html
Lucene in 5 minutes: http://www.lucenetutorial.com/lucene-in-5-minutes.html

Step 3: Learn Solr / Elasticsearch
Lucene is like engine. Embedable when required with limited features. Solr / Elastic Search are like cars who runs on Lucene.

Reference:
Lucene Vs Solr: http://www.lucenetutorial.com/lucene-vs-solr.html
Glossary of terms: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/glossary.html
Elasticsearch features: https://www.elastic.co/products/elasticsearch

Step 4: Learn Apache Nutch
We need to feed data. To collect data on web, we need web crawler. Nutch will do this stuff.

Reference:
https://en.wikipedia.org/wiki/Apache_Nutch

The above knowledge is minimum, before going to architect any search based solution.

-o-

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s