Step 1: Learn about Text Search
Learning fundamentals are always important before dealing with tools/software.
Thanks to University of Illionois and Mr. ChengXiang Zhai.
Copy rights on above diagrams belongs to “University of Illionois”
Please go through the course to understand all above.
Step 2: Learn Lucene
Lucene was developed in Java by Dough Cutting in 1999. This encapsulates many algorithms learned above.
Lucene is Mr.Dough’s wife’s middle name.
Lucene Tutorial: http://www.lucenetutorial.com/
Lucene Concepts: http://www.lucenetutorial.com/basic-concepts.html
Lucene in 5 minutes: http://www.lucenetutorial.com/lucene-in-5-minutes.html
Step 3: Learn Solr / Elasticsearch
Lucene is like engine. Embedable when required with limited features. Solr / Elastic Search are like cars who runs on Lucene.
Lucene Vs Solr: http://www.lucenetutorial.com/lucene-vs-solr.html
Glossary of terms: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/glossary.html
Elasticsearch features: https://www.elastic.co/products/elasticsearch
Step 4: Learn Apache Nutch
We need to feed data. To collect data on web, we need web crawler. Nutch will do this stuff.
The above knowledge is minimum, before going to architect any search based solution.