Ontologies

What are Ontologies?

MarkLogic Database for Ontologies
https://docs.marklogic.com/guide/semantics/intro

MongoDB Performance

Explain Query: https://nosqlbooster.com/
Tips: https://docs.mongodb.com/manual/tutorial/optimize-query-performance-with-indexes-and-projections/
MongoDB Profiler: https://studio3t.com/knowledge-base/articles/mongodb-query-performance/

-o-

Connection marked as broken because of SQLSTATE(08003)

Problem Statement: Customer is restarting PostgreSQL database every night. Long running jobs are failing from after DB restart.

https://github.com/brettwooldridge/HikariCP/issues/198

Address this:
https://github.com/brettwooldridge/HikariCP/issues/1056
Just to save the trouble of scanning through the PostgreSQL driver release notes, the Connection.setNetworkTimeout() was released in version 42.2.0 of the driver (https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.0)

Available properties
https://stackoverflow.com/questions/26490967/how-do-i-configure-hikaricp-in-my-spring-boot-app-in-my-application-properties-f/51079239#51079239

Check when database was restarted
https://yongitz.wordpress.com/2013/12/05/getting-postgresql-servers-start-time-and-uptime/
To get the start time, execute the query below:
psql -c “SELECT pg_postmaster_start_time();”

To get the uptime, execute the query below:
psql -c “SELECT now() – pg_postmaster_start_time();”

Exception Handling for guaranteed write:

boolean writeStatus = false;
while(!writeStatus)
{
try{
write to database;
writeStatus = true;
}catch (Exception ex)
{
print exception;
sleep for 30 seconds;
//Hope for db to recover.
}
}

Greenplum Database – Parallel Data Platform

Greenplum Database stores and processes large amounts of data by distributing the data and processing workload across several servers or hosts. Greenplum Database is an array of individual databases based upon PostgreSQL 8.2 working together to present a single database image. The master is the entry point to the Greenplum Database system. It is the database instance to which clients connect and submit SQL statements. The master coordinates its work with the other database instances in the system, called segments, which store and process the data.

Reference: https://greenplum.org/gpdb-sandbox-tutorials/introduction-greenplum-database-architecture/#ffs-tabbed-12

Front Page

columnar storage

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

https://parquet.apache.org/

CouchDB

Apache CouchDB™ lets you access your data where you need it by defining the Couch Replication Protocol that is implemented by a variety of projects and products that span every imaginable computing environment from globally distributed server-clusters, over mobile phones to web browsers. Software that is compatible with the Couch Replication Protocol include: PouchDB, Cloudant, and Couchbase Lite.

Store your data safely, on your own servers, or with any leading cloud provider. Your web- and native applications love CouchDB, because it speaks JSON natively and supports binary for all your data storage needs. The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling, offline-first user-experience while maintaining high performance and strong reliability. CouchDB comes with a developer-friendly query language, and optionally MapReduce for simple, efficient, and comprehensive data retrieval.
Reference: http://couchdb.apache.org/

https://db-engines.com/en/system/CouchDB%3BCouchbase%3BMongoDB

Dimension Vs Facts Table

https://en.wikipedia.org/wiki/Dimension_(data_warehouse)

https://en.wikipedia.org/wiki/Fact_table

https://network.informatica.com/thread/42342

——

ETL Design Pattern: http://www.leapfrogbi.com/2013/05/11/etl-design-patterns-the-foundation/

Standard ETL Scenarios: https://dwbi.org/etl/etl-design-pattern/57-etl-design-pattern

#data-warehouse

PostgreSQL

Step 1: Download
https://www.enterprisedb.com/downloads/postgres-postgresql-downloads#macosx

Step 2: use pgadmin to connect to PostgreSQL
https://www.pgadmin.org/

Other UI Tools: https://postgresapp.com/documentation/gui-tools.html

Step 3: Oracle vs PostgreSQl data types.
http://www.sqlines.com/oracle-to-postgresql

MongoDB Connection Pool

https://stackoverflow.com/questions/41271707/java-mongodb-connection-pool

MongoDB Notes

Write Concern:
Acknowledge Write – 1 – DB Setting – False – Fast Response – Small error of missing write
Acknowledge Write – 1 – DB Setting – True – Slow Response – No Error
Unacknowledge Write-0 – Not waiting for server to respond.

In replicated environment, there are many more variables.

——-
Network Errors

use case 1: Write to DB, Due to network error, system didn’t responded. Data was written to disk
Option 1: Based on case, due to failure, try to write one more time.
Option 2: In sensitive case, read data to make sure that data was written and act acordingly.

——-
Replication

Availability
Fault Tolerance

To select Primary, we need to have odd number of servers.

—————

Types of Replica Set Nodes
1. Regular Node – Primary or Secondary
2. Arbiter Node – Only for voting purposes. No Data on it.
3. Delayed Node – It can’t become primary node. This is one hour late on updates with compare to other nodes.
4. Hidden Node – It can’t become primary node. Used for Analytics
All nodes can participate in election

——
Write Consistency

Always writes/reads goes to Primary
Application can read from secondary
Eventual consistency

During the time when failover is occurring, can writes successfully complete?
No

—————–

Sharding

mongos is router..takes care of distribution…
Sharding is used for horizontal scalability

—————–