Thursday, February 6, 2014

Reactive Real-time Big Data with Open Source Lambda Architecture Stack

5 Why

1) Why "Reactive" ?
  • react to events
  • react to load
  • react to failure
  • react to users
2) Why "Real-time"?
3) Why "Big Data" ?
4) Why "Open Source" ? 
Security, Quality, Customizability, Freedom, Flexibility, Interoperability, Auditability, Support Options, Cost, Try Before You Buy

5) Why Lambda Architecture ?

The list of open source framework/tools I have tried:

● Netty ( a framework using reactive programming pattern for scaling HTTP system easier, by JBoss

● Apache Kafka ( a publish-subscribe messaging rethought as a distributed commit log, open sourced by Linkedin.

● Storm ( the framework for distributed realtime computation system, by Twitter

● Akka (Actor Model), a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. 
More use cases at

● Redis ( a advanced key-value in-memory NoSQL database, all fast statistical computations in here.

● OrientDB, an Open Source NoSQL DBMS with the features of both Document and Graph DBMSs for KPI Report Data Management

● Groovy and Grails for scripting layer on JVM, ad-hoc query on Redis, and the front-end

● Hadoop ecosystem : HDFS, Hive, HBase for batch processing

● RxJava a library for composing asynchronous and event-based programs

● Hystrix : for Latency and Fault Tolerance for Distributed Systems

● NVD3 Reusable D3 Chart