Lambda Architecture(LA) is a well-known data processing architecture designed to handle massive amount of data and it’s commonly known as Big Data. It contains both batch and stream processing methods.
data:image/s3,"s3://crabby-images/5adf8/5adf8abc2bbaa00df56328338ac56041650b48d7" alt="image"
When I started my journey in Data Science to process massive data came across an excellent book,
data:image/s3,"s3://crabby-images/2a8ee/2a8ee4c4e7aa248a99006c42c316f3d8f2f5f1aa" alt="image"
“Big Data Principles and best practices of scalable real-time data systems by Nathan Marz and James Warren.”
I recommend to read this book for all Big Data/Data Science developers.
In Big Data/Data Science project LA aims to satisfy the need to scale up and shrink indecently and also the system that is fault tolerant. Therefore, it is essential to get a correct structure for your project before starting over the implementation.
I spent almost couple of years building Lambda Architecture design for client specific projects. I realized in Big Data domain starter-kit projects are lacking when comparing to JavaScript world. So, I have published minimal skeleton project to process Big Data,
data:image/s3,"s3://crabby-images/eb1c7/eb1c7f2d5867c123ad9f6d3eb88cacfc5836da33" alt="image"
Introducing: ETL-Starter-Kit
Since the repository is to keep only the structure; different type of many sample jobs are not implemented. Based on your requirement be free to modify and implement different type of batch/streaming jobs (Spark, Hive, Pig etc)