Elasticsearch – ABCD

Step 1: Install Elasticsearch
Step 2: Install Chrome Sense plugin
Step 3: Try following and feel Elasticsearch (Copy paste following to left navigation of Sense screen)

PUT /customer?pretty

GET /_cat/indices?v

PUT /customer/external/1
{
“first-name”: “John”,
“last-name”: “Brown”
}

PUT /customer/external/2
{
“first-name”: “John”,
“last-name”: “White”
}

PUT /customer/external/3
{
“first-name”: “John”,
“last-name”: “Johny”
}

PUT /customer/external/4
{
“first-name”: “Johnathan”,
“last-name”: “Smith”
}

PUT /customer/external/5
{
“first-name”: “JohnyJohny”,
“last-name”: “YesPapa”
}

PUT /customer/external/6
{
“first-name”: “John”,
“last-name”: “White Paper”
}

PUT /customer/external/7
{
“first-name”: “John”,
“last-name”: “”
}

GET /customer/external/2

DELETE /customer

GET /_cat/indices?v

POST /customer/external/1/_update
{
“doc”: { “name”: “Jane Doe” }
}

POST /customer/external/1/_update?pretty
{
“doc”: { “name”: “Jane Doe”, “age”: 20 }
}

POST /customer/external/1/_update?pretty
{
“script” : “ctx._source.age += 5”
}

GET /_nodes/process?pretty

———————————-

GET /customer/_search
{
“query” : {
“match” : {
“first-name” : “John”
}
}
}

GET /customer/_stats/

ES_ABCD

Advertisements

Lucene / Elasticsearch Analyzers

In Lucene, analyzer is a combination of tokenizer (splitter) + stemmer + stopword filter

In ElasticSearch, analyzer is a combination of

1. Character filter: “tidy up” a string before it is tokenize. Example: remove html tags
2. Tokenizer: MUST have a single tokenizer. It’s used to break up the string into individual terms or tokens
3. Token filter: change, add or remove tokens. Stemmer is a token filter, it is used to get base of word, for example: “happy”, “happiness” => “happi” (Snowball demo)

Reference:
https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-analyzers.html
http://stackoverflow.com/questions/12836642/analyzers-in-elasticsearch

Demo:
http://snowball.tartarus.org/demo.php

Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html

All About Analyzers:
https://www.elastic.co/blog/found-text-analysis-part-1
https://www.elastic.co/blog/found-text-analysis-part-2

Testing Lucene Analyzers with elasticsearch
http://jontai.me/blog/2012/10/testing-lucene-analyzers-with-elasticsearch/
“Here’s an awesome plugin on github repo. It’s somewhat extension of Analyze API. Found it on official elastic plugin list.

What’s great is that it shows tokens with all their attributes after every single step. With this it is easy to debug analyzer configuration and see why we got such tokens and where we lost the ones we wanted.”
https://github.com/johtani/elasticsearch-extended-analyze
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

Implementation of Search in Enterprise Applications

Step 1: Learn about Text Search
Learning fundamentals are always important before dealing with tools/software.

Text Retrieval and Search Engines
by ChengXiang Zhai

Thanks to University of Illionois and Mr. ChengXiang Zhai.

text_retrieval_system

Text Retrieval System:
tr_architecture

Copy rights on above diagrams belongs to “University of Illionois”

Please go through the course to understand all above.

Step 2: Learn Lucene
Lucene was developed in Java by Dough Cutting in 1999. This encapsulates many algorithms learned above.
Lucene is Mr.Dough’s wife’s middle name.

Reference:
Features: https://lucene.apache.org/core/
Lucene Tutorial: http://www.lucenetutorial.com/
Lucene Concepts: http://www.lucenetutorial.com/basic-concepts.html
Lucene in 5 minutes: http://www.lucenetutorial.com/lucene-in-5-minutes.html

Step 3: Learn Solr / Elasticsearch
Lucene is like engine. Embedable when required with limited features. Solr / Elastic Search are like cars who runs on Lucene.

Reference:
Lucene Vs Solr: http://www.lucenetutorial.com/lucene-vs-solr.html
Glossary of terms: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/glossary.html
Elasticsearch features: https://www.elastic.co/products/elasticsearch

Step 4: Learn Apache Nutch
We need to feed data. To collect data on web, we need web crawler. Nutch will do this stuff.

Reference:
https://en.wikipedia.org/wiki/Apache_Nutch

The above knowledge is minimum, before going to architect any search based solution.

-o-