0
0
Hadoopdata~30 mins

Batch vs real-time ingestion in Hadoop - Hands-On Comparison

Choose your learning style9 modes available
Batch vs Real-time Data Ingestion in Hadoop
📖 Scenario: You work at a company that collects sales data from stores. Some data comes in big chunks once a day (batch), and some data comes in quickly as sales happen (real-time). You want to understand how to handle both types of data ingestion using Hadoop tools.
🎯 Goal: Build a simple example to show how batch and real-time data ingestion can be set up and processed in Hadoop. You will create data samples, configure ingestion types, process the data accordingly, and display the results.
📋 What You'll Learn
Create sample sales data for batch and real-time ingestion
Set a configuration variable to select ingestion type
Write core logic to process data differently based on ingestion type
Print the processed sales summary
💡 Why This Matters
🌍 Real World
Companies collect data in batches (like daily reports) and in real-time (like live sales). Understanding how to handle both helps in timely decision making.
💼 Career
Data engineers and analysts often work with batch and real-time data ingestion pipelines using Hadoop and related tools.
Progress0 / 4 steps
1
Create sample sales data for batch and real-time ingestion
Create two lists called batch_sales and realtime_sales. batch_sales should have these dictionaries: {'store': 'StoreA', 'sales': 100}, {'store': 'StoreB', 'sales': 150}. realtime_sales should have these dictionaries: {'store': 'StoreA', 'sales': 10}, {'store': 'StoreB', 'sales': 15}.
Hadoop
Need a hint?

Use list of dictionaries to represent sales data for each store.

2
Set ingestion type configuration
Create a variable called ingestion_type and set it to the string 'batch' or 'realtime' to select which data to process.
Hadoop
Need a hint?

Set ingestion_type to either 'batch' or 'realtime' as a string.

3
Process sales data based on ingestion type
Create a variable called sales_data that is batch_sales if ingestion_type is 'batch', else realtime_sales. Then create a dictionary called total_sales that sums sales per store from sales_data using a for loop with variables record.
Hadoop
Need a hint?

Use a conditional expression to pick sales_data. Use a for loop to add sales per store.

4
Print the total sales summary
Write a print statement to display the total_sales dictionary.
Hadoop
Need a hint?

Use print(total_sales) to show the sales summary.