MongoDB Notes

Write Concern:
Acknowledge Write – 1 – DB Setting – False – Fast Response – Small error of missing write
Acknowledge Write – 1 – DB Setting – True – Slow Response – No Error
Unacknowledge Write-0 – Not waiting for server to respond.

In replicated environment, there are many more variables.

Network Errors

use case 1: Write to DB, Due to network error, system didn’t responded. Data was written to disk
Option 1: Based on case, due to failure, try to write one more time.
Option 2: In sensitive case, read data to make sure that data was written and act acordingly.


Fault Tolerance

To select Primary, we need to have odd number of servers.


Types of Replica Set Nodes
1. Regular Node – Primary or Secondary
2. Arbiter Node – Only for voting purposes. No Data on it.
3. Delayed Node – It can’t become primary node. This is one hour late on updates with compare to other nodes.
4. Hidden Node – It can’t become primary node. Used for Analytics
All nodes can participate in election

Write Consistency

Always writes/reads goes to Primary
Application can read from secondary
Eventual consistency

During the time when failover is occurring, can writes successfully complete?



mongos is router..takes care of distribution…
Sharding is used for horizontal scalability


HBase, HDFS and Hive

HBase with Java API:
HBase web site,
HBase wiki,
HBase Reference Guide
HBase: The Definitive Guide,
Google Bigtable Paper,
Hadoop web site,
Hadoop: The Definitive Guide,
Fallacies of Distributed Computing,
HBase lightning talk slides,
Sample code,


Datawarehouse implementation using Hadoop+Hbase+Hive+SpringBatch – Part 1




Hive Manual:


What is hive?: Hive is a data warehousing infrastructure based on Hadoop
What is Hbase?: Its a distributed, versioned, column-oriented NoSQL data store, modeled after Googles Bigtable. used to host very large tables — billions of rows *times* millions of columns.
What is hadoop?: Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware using map-reduce programming paradigm.

Hbase, Hive and HDFS


Python code as service in Docker

File Name:

#This runs as service.

from flask import Flask
app = Flask(__name__)

def sum(a,b):
    sum = int(a) + int(b)
    #print "sum is", sum
    return str(sum)

if __name__ == "__main__":,host='')

File Name: requirements.txt


File Name: Dockerfile

# Use an official Python runtime as a base image. Get this version from local system with >python –version
FROM python:2.7.10

# Set the working directory to /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt. These are packages used in python script.
RUN pip install -r requirements.txt

# Make port 5000 available to the world outside this container

# Define environment variable

# Run when the container launches
ENTRYPOINT [“python”, “”]

>docker build -t calc .

>docker run -p 5000:5000 -it calc

In browser

Note: Remove debug=true and run in background mode in real time.
1. This makes easy to integrate with docker.
2. Load balancing need to be taken care separately.
3. Scaling is easy based on demand
4. Easy to deploy/redeploy patches
All advantages of micro services.