0
0
Hadoopdata~30 mins

Lambda architecture (batch + streaming) in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Lambda Architecture with Batch and Streaming Data
📖 Scenario: You work for a retail company that collects sales data from stores. The data comes in two ways: large files uploaded daily (batch data) and live sales events streaming in real-time (streaming data). You want to combine both to get a complete view of sales.
🎯 Goal: Build a simple Lambda architecture example that processes batch sales data and streaming sales data separately, then combines their results to get total sales per product.
📋 What You'll Learn
Create a batch dataset of sales with product names and quantities
Create a streaming dataset of sales events with product names and quantities
Write batch processing logic to sum quantities per product
Write streaming processing logic to sum quantities per product
Combine batch and streaming results to get total sales per product
Print the combined total sales
💡 Why This Matters
🌍 Real World
Retail companies often collect sales data in batches (daily reports) and streams (live transactions). Combining both helps get up-to-date sales insights.
💼 Career
Data engineers and data scientists use Lambda architecture to handle large-scale data processing combining batch and real-time data for analytics and reporting.
Progress0 / 4 steps
1
Create batch sales data
Create a dictionary called batch_sales with these exact entries: 'apple': 100, 'banana': 150, 'orange': 120 representing quantities sold in batch data.
Hadoop
Need a hint?

Use curly braces to create a dictionary with keys as product names and values as quantities.

2
Create streaming sales data
Create a list called streaming_sales with these exact tuples representing live sales events: ('apple', 20), ('banana', 30), ('orange', 25).
Hadoop
Need a hint?

Use square brackets to create a list of tuples with product names and quantities.

3
Process batch and streaming data
Create a dictionary called streaming_totals that sums quantities per product from streaming_sales using a for loop with variables product and qty. Then create a dictionary called total_sales that adds batch and streaming quantities per product.
Hadoop
Need a hint?

Use a for loop to sum streaming sales. Use dict.get(key, 0) to handle missing keys.

4
Print total sales per product
Write a for loop with variables product and qty to iterate over total_sales.items() and print the product and total quantity in the format: Product: apple, Total Sold: 120.
Hadoop
Need a hint?

Use a for loop over total_sales.items() and an f-string to format the print output.