0
0
Hadoopdata~15 mins

Hadoop distributions (Cloudera, Hortonworks) - Mini Project: Build & Apply

Choose your learning style9 modes available
Explore Hadoop Distributions: Cloudera and Hortonworks
📖 Scenario: You are working as a data engineer in a company that uses Hadoop for big data processing. Your manager wants you to understand the differences between two popular Hadoop distributions: Cloudera and Hortonworks. This knowledge will help you decide which distribution to use for your next project.
🎯 Goal: Build a simple data structure to compare features of Cloudera and Hortonworks distributions, then filter and display key differences.
📋 What You'll Learn
Create a dictionary with Hadoop distributions and their features
Add a configuration variable to select a feature to compare
Use a loop to filter distributions based on the selected feature
Print the filtered results clearly
💡 Why This Matters
🌍 Real World
Understanding Hadoop distributions helps data engineers choose the right tools for big data projects based on features like security and support.
💼 Career
Knowledge of Hadoop distributions is important for roles like data engineer, big data analyst, and system administrator working with Hadoop ecosystems.
Progress0 / 4 steps
1
Create a dictionary of Hadoop distributions and their features
Create a dictionary called hadoop_distributions with these exact entries: 'Cloudera' with features ['Security', 'Management', 'Support'] and 'Hortonworks' with features ['Open Source', 'Security', 'Integration'].
Hadoop
Need a hint?

Use curly braces {} to create a dictionary. Each key is a distribution name, and the value is a list of features.

2
Add a feature to filter distributions
Create a variable called feature_to_check and set it to the string 'Security'.
Hadoop
Need a hint?

Assign the string 'Security' to the variable feature_to_check.

3
Filter distributions by the selected feature
Create a list called distributions_with_feature that contains the names of distributions from hadoop_distributions where feature_to_check is in their features list. Use a for loop with variables distribution and features to iterate over hadoop_distributions.items().
Hadoop
Need a hint?

Use a for loop to check each distribution's features. If the feature matches, add the distribution name to the list.

4
Print the distributions that have the selected feature
Write a print statement to display the text 'Distributions with feature Security:' followed by the list distributions_with_feature.
Hadoop
Need a hint?

Use print with a string and the list variable separated by a comma.