0
0
Hadoopdata~15 mins

Hadoop vs Spark comparison - Hands-On Comparison

Choose your learning style9 modes available
Hadoop vs Spark Comparison
📖 Scenario: You work as a data analyst in a company that uses big data tools. Your manager wants you to compare two popular big data frameworks: Hadoop and Spark. You will create a small dataset with their features and performance metrics, then filter and display the best option based on speed.
🎯 Goal: Build a Python program that stores Hadoop and Spark data in a dictionary, sets a speed threshold, filters frameworks faster than the threshold, and prints the filtered results.
📋 What You'll Learn
Create a dictionary called frameworks with keys 'Hadoop' and 'Spark' and values as dictionaries containing 'speed' and 'ease_of_use' ratings.
Create a variable called speed_threshold with a numeric value.
Use a dictionary comprehension to create a new dictionary fast_frameworks with only frameworks having speed greater than speed_threshold.
Print the fast_frameworks dictionary.
💡 Why This Matters
🌍 Real World
Companies often compare big data tools like Hadoop and Spark to choose the best one for their needs based on speed and usability.
💼 Career
Data analysts and engineers must understand how to organize and filter data to make informed decisions about technology choices.
Progress0 / 4 steps
1
Create the data dictionary
Create a dictionary called frameworks with these exact entries: 'Hadoop': {'speed': 5, 'ease_of_use': 3} and 'Spark': {'speed': 9, 'ease_of_use': 8}.
Hadoop
Need a hint?

Use a dictionary with keys 'Hadoop' and 'Spark'. Each value is another dictionary with keys 'speed' and 'ease_of_use'. Use the exact numbers given.

2
Set the speed threshold
Create a variable called speed_threshold and set it to 6.
Hadoop
Need a hint?

Just create a variable named speed_threshold and assign the number 6.

3
Filter frameworks by speed
Use a dictionary comprehension to create a new dictionary called fast_frameworks that includes only the frameworks from frameworks where the 'speed' value is greater than speed_threshold.
Hadoop
Need a hint?

Use a dictionary comprehension with for name, data in frameworks.items() and filter with if data['speed'] > speed_threshold.

4
Print the filtered frameworks
Write a print statement to display the fast_frameworks dictionary.
Hadoop
Need a hint?

Use print(fast_frameworks) to show the filtered dictionary.