Greenplum Database stores and processes large amounts of data by distributing the data and processing workload across several servers or hosts. Greenplum Database is an array of individual databases based upon PostgreSQL 8.2 working together to present a single database image. The master is the entry point to the Greenplum Database system. It is the database instance to which clients connect and submit SQL statements. The master coordinates its work with the other database instances in the system, called segments, which store and process the data.
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Apache CouchDB™ lets you access your data where you need it by defining the Couch Replication Protocol that is implemented by a variety of projects and products that span every imaginable computing environment from globally distributed server-clusters, over mobile phones to web browsers. Software that is compatible with the Couch Replication Protocol include: PouchDB, Cloudant, and Couchbase Lite.
Store your data safely, on your own servers, or with any leading cloud provider. Your web- and native applications love CouchDB, because it speaks JSON natively and supports binary for all your data storage needs. The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling, offline-first user-experience while maintaining high performance and strong reliability. CouchDB comes with a developer-friendly query language, and optionally MapReduce for simple, efficient, and comprehensive data retrieval.
ETL Design Pattern: http://www.leapfrogbi.com/2013/05/11/etl-design-patterns-the-foundation/
Standard ETL Scenarios: https://dwbi.org/etl/etl-design-pattern/57-etl-design-pattern
Step 2: use pgadmin to connect to PostgreSQL
Other UI Tools: https://postgresapp.com/documentation/gui-tools.html
Step 3: Oracle vs PostgreSQl data types.
Acknowledge Write – 1 – DB Setting – False – Fast Response – Small error of missing write
Acknowledge Write – 1 – DB Setting – True – Slow Response – No Error
Unacknowledge Write-0 – Not waiting for server to respond.
In replicated environment, there are many more variables.
use case 1: Write to DB, Due to network error, system didn’t responded. Data was written to disk
Option 1: Based on case, due to failure, try to write one more time.
Option 2: In sensitive case, read data to make sure that data was written and act acordingly.
To select Primary, we need to have odd number of servers.
Types of Replica Set Nodes
1. Regular Node – Primary or Secondary
2. Arbiter Node – Only for voting purposes. No Data on it.
3. Delayed Node – It can’t become primary node. This is one hour late on updates with compare to other nodes.
4. Hidden Node – It can’t become primary node. Used for Analytics
All nodes can participate in election
Always writes/reads goes to Primary
Application can read from secondary
During the time when failover is occurring, can writes successfully complete?
mongos is router..takes care of distribution…
Sharding is used for horizontal scalability