Learn this to solve Big Problems

https://www.coursera.org
Hadoop Platform and Application Framework
by University of California, San Diego
————-
https://university.mongodb.com/courses/MongoDB/
M101J: MongoDB for Java Developers

http://orientdb.com/docs/3.0.x/

————-
https://www.elastic.co/
https://polimetlase.wordpress.com/?s=elasticsearch

————-
CDAP
http://cask.co/products/cdap/

Hortonworks Data Platform
https://hortonworks.com/

————-

Hive Modeling / Hive Queries
https://hive.apache.org/

HDFS
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
————-
MapReduce
https://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

http://spark.apache.org/docs/latest/
Spark Scala API (Scaladoc)
Spark Java API (Javadoc)
Spark Python API (Sphinx)
Spark R API (Roxygen2)

http://twill.apache.org/
Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed applications, allowing developers to focus instead on their application logic. Apache Twill allows you to use YARN’s distributed capabilities with a programming model that is similar to running threads.
————-
https://tika.apache.org/
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

https://nifi.apache.org/
An easy to use, powerful, and reliable system to process and distribute data.

————-
https://www.docker.com/
Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud.

https://nixos.org
Used to package PY/C++ into docker.

————-
Domain Knowledge:
Digital Asset Management (DAM)
https://polimetlase.wordpress.com/2017/03/20/digital-asset-management/
PRISM – https://polimetlase.wordpress.com/2017/03/10/categorize-and-search-documents/

————-

Apache Kafka: A Distributed Streaming Platform.
https://kafka.apache.org/

https://flume.apache.org/
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
————-

git
jira
wiki
————-

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s