0
0
Hadoopdata~30 mins

Kappa architecture (streaming only) in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Kappa Architecture Streaming Data Processing with Hadoop
📖 Scenario: You work at a company that collects real-time sensor data from machines on a factory floor. The data streams continuously and needs to be processed immediately to detect any anomalies.Using the Kappa architecture, you will build a simple streaming data pipeline with Hadoop tools to process this data in real-time.
🎯 Goal: Build a streaming data processing pipeline using Kappa architecture principles with Hadoop tools. You will simulate streaming data, configure a processing threshold, apply streaming logic to filter data, and output the filtered results.
📋 What You'll Learn
Create a simulated streaming data list of sensor readings with exact values
Add a threshold variable to filter sensor readings
Use a streaming processing loop to filter readings above the threshold
Print the filtered streaming data output
💡 Why This Matters
🌍 Real World
Factories and industries use streaming data pipelines to monitor machines and detect problems instantly to avoid downtime.
💼 Career
Understanding Kappa architecture and streaming data processing is essential for data engineers and data scientists working with real-time data systems.
Progress0 / 4 steps
1
Create simulated streaming sensor data
Create a list called sensor_stream with these exact integer values representing sensor readings: 12, 7, 15, 9, 20, 5.
Hadoop
Need a hint?

Use square brackets to create a list and separate values with commas.

2
Set a threshold for filtering sensor readings
Create a variable called threshold and set it to the integer value 10.
Hadoop
Need a hint?

Assign the number 10 to the variable threshold using the equals sign.

3
Filter streaming data above the threshold
Create a new list called filtered_stream that contains only the readings from sensor_stream which are greater than the threshold. Use a for loop with the variable reading to iterate over sensor_stream and append qualifying readings to filtered_stream.
Hadoop
Need a hint?

Start with an empty list, loop through sensor_stream, check if each reading is greater than threshold, and add it to filtered_stream if yes.

4
Print the filtered streaming data output
Write a print statement to display the filtered_stream list.
Hadoop
Need a hint?

Use print(filtered_stream) to show the filtered readings.