Amazon.com: Buying Choices: OCUP 2 Certification Guide: Preparing for the OMG Certified UML 2.5 Professional 2 Foundation Exam
by Michael Jesse Chonoles
—– SEI CMM Software Architecture Series Books
Amazon.com: Buying Choices: Software Architecture in Practice (3rd Edition) (SEI Series in Software Engineering)
by Len Bass et al.
Amazon.com: Buying Choices: Designing Software Architectures: A Practical Approach (SEI Series in Software Engineering)
by Humberto Cervantes et al.
Amazon.com: Buying Choices: Evaluating Software Architectures: Methods and Case Studies (SEI Series in Software Engineering) by Paul Clements (22-Oct-2001) Hardcover
by Paul Clements
Note: As a senior architect we should know how to evaluate/compare and understand existing projects.
Amazon.com: Buying Choices: Documenting Software Architectures: Views and Beyond (2nd Edition)
by Paul Clements et al.
Amazon.com: Buying Choices: TOGAF Version 9.1
by Van Haren Publishing
Note: Certification helps a lot
Patterns: These are important to know and easy to re-use
Amazon.com: Buying Choices: Patterns of Enterprise Application Architecture
by Martin Fowler
Amazon.com: Buying Choices: Security Patterns in Practice: Designing Secure Architectures Using Software Patterns
by Eduardo Fernandez-Buglioni
Amazon.com: Buying Choices: Head First Design Patterns: A Brain-Friendly Guide
by Eric Freeman et al.
Note: Look for the latest edition
Amazon.com: Buying Choices: Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions
by Gregor Hohpe et al.
Amazon.com: Buying Choices: Domain-Driven Design: Tackling Complexity in the Heart of Software
by Eric Evans
Note: This is very important when working with specific domains like Finance, Media, Auto, Insurance,…etc
Amazon.com: Buying Choices: Service Design Patterns: Fundamental Design Solutions for SOAP/WSDL and RESTful Web Services
by Robert Daigneau
Amazon.com: Buying Choices: NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software
by Ted Hills
Amazon.com: Buying Choices: Database Design Using Entity-Relationship Diagrams, Second Edition (Foundations of Database Design)
by Sikha Bagui et al.
Note: Buy a similar book
Amazon.com: Buying Choices: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
by Ralph Kimball et al.
Must be working on at least one in each category
RDBMS (Oracle, MS SQL, Postgres, MySQL,…etc)
NO-SQL (MongoDB, MarkLogic,..etc)
Service (Java Services, Spring, Python Flask,…etc)
UI (NodeJS, React, Angular, HTML, JS, CSS)
Reporting (Jasper, Tableau)
Data Warehousing Fundamentals
ETL (Informatica,Apache Nifi,..etc)
OS (Linux Redhat, Ubuntu,…etc
Messaging (JMS, Kafka,…etc)
Cloud (AWS,Cloudera,IBM,MS Azure,…etc)
SEI CMM Process
BigData (Hadoop,..etc )
Note: If you are coming from a non-software background, please do MS or BS in Computer Science
Or get textbooks from BS course and study in your free time.
Bible for Software Engineers
Amazon.com: Buying Choices: Software Engineering: A Practitioner’s Approach
by Roger S. Pressman et al.
Amazon.com: Buying Choices: Foundations of Software Testing ISTQB Certification
by Rex Black et al.
Amazon.com: Buying Choices: Learning Selenium Testing Tools – Third Edition
by Raghavendra Prasad MG
Note: Check different books on Selenium
—– Project Management
Amazon.com: Buying Choices: PMP Exam Prep, Eighth Edition – Updated: Rita’s Course in a Book for Passing the PMP Exam
by Rita Mulcahy
Amazon.com: Buying Choices: Essential Scrum: A Practical Guide to the Most Popular Agile Process (Addison-Wesley Signature Series (Cohn))
by Kenneth S. Rubin
Requirements / UX
Amazon.com: Buying Choices: Lean UX: Designing Great Products with Agile Teams
by Jeff Gothelf et al.
Amazon.com: Buying Choices: Software Requirements (3rd Edition) (Developer Best Practices)
by Karl Wiegers et al.
Backend as a Service or “BaaS”
Function as a Service or “FaaS” (Example: http://openwhisk.incubator.apache.org/)
AWS Lambda: https://aws.amazon.com/lambda/
1. Easy to develop and deploy light weight systems.
2. Good for systems, which use less CPU and low usage. Saves money on infrastructure.
1. Enterprise scale high throughput applications need to pay more money to IBM/Amazon/.etc.
2. We need to pay for each task
2. Data storage
3. Disk Space
4. Total number of calls
Almost it becomes like mainframe systems.
Negotiate for timelines and resources
Communicate between project stackholders
Evangiliage best practices
Identify and Mitigate Risks
Be strong in Technology/Software fundamentals (Just not white paper knowledge)
Be proficient in Agile methodologies
This is pretty old age problem to be solved in majority of projects.
History: It comes under Flow based programming: https://en.wikipedia.org/wiki/Flow-based_programming
Our focus is to move data from system A to system B. Only Extraction and Loading. Not much about Transformations.
——————————— Option 0: Hand coding in Python / Java / PERL …etc
This is good for small sets of data. Also good for POC.
Not suggested to push to production without failover, managing jobs, scheduling jobs,…etc
Option 1: If system is heavy and need robust solution, better to go with Apache NiFi
The US National Security Agency open-sourced its Niagrafiles, or NiFi, data-flow software.
How to enable security for NiFi?
How to write Java code for NiFi and other languages?
Other directory with date suffix examples
Commercial support available:
Externalizing variables possible.
Easy to move configurations from QA to Prod
We can slim down the system to minimize its foot print
NiFi support Hadoop HDFS
But Storm objective is different.
Option 2: Use streaming API of Apache Spark
Sqoop Vs Flume
Option 3: If you are using CDAP, better to use Hydrator to generate JSON and use it.
Bit more study required around metrics, management and tracking these jobs.
Better to stay away from CDAP stack. There is not much public acceptance. No response on their forums. If we ask question, they wont respond. If we call them, they will ask us to buy their support/consulting hours. Nothing wrong in this. But we can’t afford.
We can check their poor support in their groups
Option 4: Pentaho Kettle
It is not ready for Big Data as on March 2017
Good for small java enterprise projects (Coding required with Kettle API). Used in the past.
http://javadoc.pentaho.com/kettle/ – Java documentation quality is not good.
http://www.alteryx.com/ is good product and it is having better support with https://www.tableau.com/ (BI/Analytics)
Option 6: Spring Batch
If we want to minimize number of servers, we want minimal solution, Spring Batch is good one.
But it needs continuous maintenance when there is change in Spring / Java version.
Spring Integration: http://docs.spring.io/spring-integration/reference/html/ftp.html
Spring batch partitioning: https://keyholesoftware.com/2013/12/09/spring-batch-partitioning/
Spring Batch Reference: http://docs.spring.io/spring-batch/reference/html/index.html
Spring Batch UI: http://docs.spring.io/spring-batch-admin/reference/reference.xhtml
Use Apache NiFi as much as possible. Works well in production and also quick in POCs
As on March 11 2017: https://groups.google.com/forum/#!topic/cdap-user/hiuUP3jIxNs
CDAP Hydrator is not in a position to compete with Apache NiFi