MongoDB Notes

Write Concern:
Acknowledge Write – 1 – DB Setting – False – Fast Response – Small error of missing write
Acknowledge Write – 1 – DB Setting – True – Slow Response – No Error
Unacknowledge Write-0 – Not waiting for server to respond.

In replicated environment, there are many more variables.

——-
Network Errors

use case 1: Write to DB, Due to network error, system didn’t responded. Data was written to disk
Option 1: Based on case, due to failure, try to write one more time.
Option 2: In sensitive case, read data to make sure that data was written and act acordingly.

——-
Replication

Availability
Fault Tolerance

To select Primary, we need to have odd number of servers.

—————

Types of Replica Set Nodes
1. Regular Node – Primary or Secondary
2. Arbiter Node – Only for voting purposes. No Data on it.
3. Delayed Node – It can’t become primary node. This is one hour late on updates with compare to other nodes.
4. Hidden Node – It can’t become primary node. Used for Analytics
All nodes can participate in election

——
Write Consistency

Always writes/reads goes to Primary
Application can read from secondary
Eventual consistency

During the time when failover is occurring, can writes successfully complete?
No

—————–

Sharding

mongos is router..takes care of distribution…
Sharding is used for horizontal scalability

—————–

HBase, HDFS and Hive

References
HBase with Java API: https://dzone.com/articles/handling-big-data-hbase-part-4
HBase web site, http://hbase.apache.org/
HBase wiki, http://wiki.apache.org/hadoop/Hbase
HBase Reference Guide http://hbase.apache.org/book/book.html
HBase: The Definitive Guide, http://bit.ly/hbase-definitive-guide
Google Bigtable Paper, http://labs.google.com/papers/bigtable.html
Hadoop web site, http://hadoop.apache.org/
Hadoop: The Definitive Guide, http://bit.ly/hadoop-definitive-guide
Fallacies of Distributed Computing, http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
HBase lightning talk slides, http://www.slideshare.net/scottleber/hbase-lightningtalk
Sample code, https://github.com/sleberknight/basic-hbase-examples

———-

Datawarehouse implementation using Hadoop+Hbase+Hive+SpringBatch – Part 1

 

——–

—–

Hive Manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual

——

What is hive?: Hive is a data warehousing infrastructure based on Hadoop
What is Hbase?: Its a distributed, versioned, column-oriented NoSQL data store, modeled after Googles Bigtable. used to host very large tables — billions of rows *times* millions of columns.
What is hadoop?: Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware using map-reduce programming paradigm.

Hbase, Hive and HDFS

Reference: http://blog.nbostech.com/2013/03/hadoop-hive-hbase-installation-on-mac-os-x/

Python code as service in Docker

File Name: calc.py

#This runs as service.

from flask import Flask
app = Flask(__name__)

@app.route("/sum/<int:a>/<int:b>/")
def sum(a,b):
    sum = int(a) + int(b)
    #print "sum is", sum
    return str(sum)

if __name__ == "__main__":
    app.run(debug=True,host='0.0.0.0')
    

————
File Name: requirements.txt

flask==0.12.1
————

File Name: Dockerfile

# Use an official Python runtime as a base image. Get this version from local system with >python –version
FROM python:2.7.10

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt. These are packages used in python script.
RUN pip install -r requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define environment variable
ENV NAME calc

# Run calc.py when the container launches
ENTRYPOINT [“python”, “calc.py”]

————
>docker build -t calc .

>docker run -p 5000:5000 -it calc

In browser
http://localhost:5000/sum/1/2/
3

Note: Remove debug=true and run in background mode in real time.
————–
1. This makes easy to integrate with docker.
2. Load balancing need to be taken care separately.
3. Scaling is easy based on demand
4. Easy to deploy/redeploy patches
All advantages of micro services.
-o-