Data flows from raw input into two paths: batch for large-scale processing and speed for real-time updates. Both results combine in the serving layer to answer user queries.
Execution Sample
Hadoop
1. Collect raw data continuously
2. Batch layer processes data in large chunks
3. Speed layer processes data in real-time
4. Serving layer merges batch and speed views
5. User queries get combined results
This shows how data moves through batch and speed layers, then merges for user queries.
Execution Table
Step
Layer
Action
Data Processed
Output Produced
1
Raw Data Input
Collect data stream
Events 1-1000
Raw data stored
2
Batch Layer
Process batch data
Events 1-1000
Batch view updated
3
Speed Layer
Process real-time data
Events 1001-1100
Real-time view updated
4
Serving Layer
Merge batch and speed views
Batch + Real-time views
Unified view ready
5
User Queries
Query unified view
Unified view
Query results returned
6
Batch Layer
Next batch processing
Events 1-2000
Batch view updated
7
Speed Layer
Process new real-time data
Events 2001-2100
Real-time view updated
8
Serving Layer
Merge updated views
Batch + Real-time views
Unified view refreshed
9
User Queries
Query refreshed view
Unified view
Updated query results
10
End
No more data or queries
-
-
💡 Execution stops when no new data arrives or queries are made.
Variable Tracker
Variable
Start
After Step 2
After Step 3
After Step 4
After Step 6
After Step 7
After Step 8
Raw Data
Empty
Events 1-1000
Events 1-1100
Events 1-1100
Events 1-2000
Events 1-2100
Events 1-2100
Batch View
Empty
Processed Events 1-1000
Processed Events 1-1000
Processed Events 1-1000
Processed Events 1-2000
Processed Events 1-2000
Processed Events 1-2000
Speed View
Empty
Empty
Processed Events 1001-1100
Processed Events 1001-1100
Processed Events 1001-1100
Processed Events 2001-2100
Processed Events 2001-2100
Unified View
Empty
Empty
Empty
Batch + Speed Views merged
Empty
Empty
Batch + Speed Views merged
Key Moments - 3 Insights
Why do we need both batch and speed layers instead of just one?
Batch layer handles large data accurately but slowly (see Step 2 and 6). Speed layer handles recent data quickly but less accurately (see Step 3 and 7). Combining both gives fast and accurate results (Step 4 and 8).
How does the serving layer combine data from batch and speed layers?
Serving layer merges batch views (historical data) and speed views (real-time data) to create a unified view for queries (Step 4 and 8 in execution_table).
What happens if the speed layer misses some data?
The batch layer will eventually process all data in large chunks, correcting any misses from the speed layer, ensuring accuracy over time (compare Step 2 and 6).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what data does the speed layer process at Step 3?
AEvents 1001-1100
BEvents 1-1000
CEvents 1-1100
DEvents 2001-2100
💡 Hint
Check the 'Data Processed' column for Step 3 in the execution_table.
At which step does the serving layer first merge batch and speed views?
AStep 2
BStep 3
CStep 4
DStep 6
💡 Hint
Look for 'Serving Layer' and 'Merge batch and speed views' in the execution_table.
If the batch layer processes data more frequently, how would the batch view change in variable_tracker?
ABatch view updates less often
BBatch view updates more often with more data
CSpeed view updates more often
DUnified view stops updating
💡 Hint
Refer to the 'Batch View' row in variable_tracker and think about batch processing frequency.
Concept Snapshot
Lambda Architecture combines batch and streaming data processing.
Batch layer processes large data sets slowly but accurately.
Speed layer processes recent data quickly but less accurately.
Serving layer merges both views for fast and accurate queries.
This design balances latency and accuracy in big data systems.
Full Transcript
Lambda architecture splits data processing into batch and speed layers. Raw data flows into both layers. The batch layer processes large chunks of data to create accurate batch views. The speed layer processes data in real-time to create fast but approximate views. The serving layer merges these views to provide users with up-to-date and accurate query results. This approach ensures low latency and high accuracy by combining the strengths of both batch and streaming processing.