Hadoopdata~15 mins

Hadoop vs Spark comparison - Hands-On Comparison

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Hadoop vs Spark Comparison

📖 Scenario: You work as a data analyst in a company that uses big data tools. Your manager wants you to compare two popular big data frameworks: Hadoop and Spark. You will create a small dataset with their features and performance metrics, then filter and display the best option based on speed.

🎯 Goal: Build a Python program that stores Hadoop and Spark data in a dictionary, sets a speed threshold, filters frameworks faster than the threshold, and prints the filtered results.

📋 What You'll Learn

Create a dictionary called frameworks with keys 'Hadoop' and 'Spark' and values as dictionaries containing 'speed' and 'ease_of_use' ratings.

Create a variable called speed_threshold with a numeric value.

Use a dictionary comprehension to create a new dictionary fast_frameworks with only frameworks having speed greater than speed_threshold.

Print the fast_frameworks dictionary.

💡 Why This Matters

🌍 Real World

Companies often compare big data tools like Hadoop and Spark to choose the best one for their needs based on speed and usability.

💼 Career

Data analysts and engineers must understand how to organize and filter data to make informed decisions about technology choices.

Progress0 / 4 steps

Create the data dictionary

Create a dictionary called frameworks with these exact entries: 'Hadoop': {'speed': 5, 'ease_of_use': 3} and 'Spark': {'speed': 9, 'ease_of_use': 8}.

Hadoop

# Create the frameworks dictionary with Hadoop and Spark data
# Your code here

Need a hint?

Use a dictionary with keys 'Hadoop' and 'Spark'. Each value is another dictionary with keys 'speed' and 'ease_of_use'. Use the exact numbers given.

Set the speed threshold

Create a variable called speed_threshold and set it to 6.

Hadoop

frameworks = {
    'Hadoop': {'speed': 5, 'ease_of_use': 3},
    'Spark': {'speed': 9, 'ease_of_use': 8}
}
# Set the speed threshold variable
# Your code here

Need a hint?

Just create a variable named speed_threshold and assign the number 6.

Filter frameworks by speed

Use a dictionary comprehension to create a new dictionary called fast_frameworks that includes only the frameworks from frameworks where the 'speed' value is greater than speed_threshold.

Hadoop

frameworks = {
    'Hadoop': {'speed': 5, 'ease_of_use': 3},
    'Spark': {'speed': 9, 'ease_of_use': 8}
}
speed_threshold = 6
# Create fast_frameworks dictionary with frameworks faster than speed_threshold
# Your code here

Need a hint?

Use a dictionary comprehension with for name, data in frameworks.items() and filter with if data['speed'] > speed_threshold.

Print the filtered frameworks

Write a print statement to display the fast_frameworks dictionary.

Hadoop

frameworks = {
    'Hadoop': {'speed': 5, 'ease_of_use': 3},
    'Spark': {'speed': 9, 'ease_of_use': 8}
}
speed_threshold = 6
fast_frameworks = {name: data for name, data in frameworks.items() if data['speed'] > speed_threshold}
# Print the fast_frameworks dictionary
# Your code here

Need a hint?

Use print(fast_frameworks) to show the filtered dictionary.