JSON plugin for Eclipse

When working on important data (PHI, PII,…etc), better not to use online JSON Formatters and validators. They also have limitations.

Best tool is use JSON plugin for Eclipse
https://github.com/boothen/Json-Eclipse-Plugin

This plugin adds support for JSON files to eclipse. You can install the latest version directly from this update site:

http://boothen.github.io/Json-Eclipse-Plugin/

Dealing with Files in Hadoop

Use Case: We have 1 million files to process and provide option to download.

Hadoop is meant to bring process to data. We can store processed file content or meta data in HBase to support easy search. Upon successful search, user want to see original document. During that time we can download file from NAS easily.

HDFS: This is not meant to store large files. 16MB is block size. We can configure to support to store small files. But not supposed to be.

HBase: Default block size is 100kb. We can tweak, but not meant to store proprietary data formats.

NAS: Network Attached Storage is easy to store/retrieve original files, When we don’t have map reduce nature of jobs.

Text Processing Architecture

Open Search Text Server
http://www.opentext.com/what-we-do/industries/legal/legal-content-management-edocs/opentext-search-server-edocs-edition

Noggle
https://www.noggle.online/knowledgebase/cognitive-search-engine/

http://blogs.forrester.com/mike_gualtieri/17-06-12-cognitive_search_is_the_ai_version_of_enterprise_search
Cognitive Search Is The AI Version Of Enterprise Search

https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html