0
0
Hadoopdata~15 mins

When to use Hadoop in modern data stacks - Mini Project: Build & Apply

Choose your learning style9 modes available
When to use Hadoop in modern data stacks
📖 Scenario: You work in a company that collects a lot of data from different sources every day. Your team wants to decide if Hadoop is the right tool to store and process this data alongside other modern tools.
🎯 Goal: Learn how to identify situations where Hadoop is useful in modern data stacks by creating a simple checklist and applying it to example data scenarios.
📋 What You'll Learn
Create a dictionary with example data scenarios and their data sizes
Add a threshold variable for big data size
Use a comprehension to select scenarios where Hadoop is recommended
Print the selected scenarios
💡 Why This Matters
🌍 Real World
Companies collect data from many sources like logs, databases, and sensors. Hadoop helps store and process very large data sets efficiently.
💼 Career
Knowing when to use Hadoop helps data engineers and analysts choose the right tools for big data projects, improving performance and cost.
Progress0 / 4 steps
1
Create example data scenarios
Create a dictionary called data_scenarios with these exact entries: 'small_logs': 10, 'medium_db': 500, 'large_clickstream': 5000, 'huge_sensor_data': 20000. The numbers represent data size in gigabytes.
Hadoop
Need a hint?

Use curly braces to create a dictionary and separate each key-value pair with a comma.

2
Set the big data size threshold
Create a variable called big_data_threshold and set it to 1000 to represent the minimum data size in gigabytes where Hadoop is recommended.
Hadoop
Need a hint?

Just assign the number 1000 to the variable big_data_threshold.

3
Select scenarios suitable for Hadoop
Use a dictionary comprehension to create a new dictionary called hadoop_recommended that includes only the entries from data_scenarios where the data size is greater than big_data_threshold.
Hadoop
Need a hint?

Use a dictionary comprehension with for scenario, size in data_scenarios.items() and an if condition.

4
Print the Hadoop recommended scenarios
Write a print statement to display the hadoop_recommended dictionary.
Hadoop
Need a hint?

Use print(hadoop_recommended) to show the dictionary.