Hadoopdata~10 mins

Lambda architecture (batch + streaming) in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Lambda architecture (batch + streaming)

Raw Data Input

↓

Batch Layer

↓

Batch Views

↓

Serving Layer

↓

User Queries

Data flows from raw input into two paths: batch for large-scale processing and speed for real-time updates. Both results combine in the serving layer to answer user queries.

Execution Sample

Hadoop

1. Collect raw data continuously
2. Batch layer processes data in large chunks
3. Speed layer processes data in real-time
4. Serving layer merges batch and speed views
5. User queries get combined results

This shows how data moves through batch and speed layers, then merges for user queries.

Execution Table

Step	Layer	Action	Data Processed	Output Produced
1	Raw Data Input	Collect data stream	Events 1-1000	Raw data stored
2	Batch Layer	Process batch data	Events 1-1000	Batch view updated
3	Speed Layer	Process real-time data	Events 1001-1100	Real-time view updated
4	Serving Layer	Merge batch and speed views	Batch + Real-time views	Unified view ready
5	User Queries	Query unified view	Unified view	Query results returned
6	Batch Layer	Next batch processing	Events 1-2000	Batch view updated
7	Speed Layer	Process new real-time data	Events 2001-2100	Real-time view updated
8	Serving Layer	Merge updated views	Batch + Real-time views	Unified view refreshed
9	User Queries	Query refreshed view	Unified view	Updated query results
10	End	No more data or queries	-	-

💡 Execution stops when no new data arrives or queries are made.

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	After Step 6	After Step 7	After Step 8
Raw Data	Empty	Events 1-1000	Events 1-1100	Events 1-1100	Events 1-2000	Events 1-2100	Events 1-2100
Batch View	Empty	Processed Events 1-1000	Processed Events 1-1000	Processed Events 1-1000	Processed Events 1-2000	Processed Events 1-2000	Processed Events 1-2000
Speed View	Empty	Empty	Processed Events 1001-1100	Processed Events 1001-1100	Processed Events 1001-1100	Processed Events 2001-2100	Processed Events 2001-2100
Unified View	Empty	Empty	Empty	Batch + Speed Views merged	Empty	Empty	Batch + Speed Views merged

Key Moments - 3 Insights

Why do we need both batch and speed layers instead of just one?

How does the serving layer combine data from batch and speed layers?

What happens if the speed layer misses some data?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what data does the speed layer process at Step 3?

AEvents 1001-1100

BEvents 1-1000

CEvents 1-1100

DEvents 2001-2100

Concept Snapshot

Lambda Architecture combines batch and streaming data processing.
Batch layer processes large data sets slowly but accurately.
Speed layer processes recent data quickly but less accurately.
Serving layer merges both views for fast and accurate queries.
This design balances latency and accuracy in big data systems.

Full Transcript

Lambda architecture splits data processing into batch and speed layers. Raw data flows into both layers. The batch layer processes large chunks of data to create accurate batch views. The speed layer processes data in real-time to create fast but approximate views. The serving layer merges these views to provide users with up-to-date and accurate query results. This approach ensures low latency and high accuracy by combining the strengths of both batch and streaming processing.