0
0
Hadoopdata~30 mins

Why Hadoop was created for big data - See It in Action

Choose your learning style9 modes available
Understanding Why Hadoop Was Created for Big Data
📖 Scenario: Imagine you work at a company that collects a huge amount of data every day, like millions of photos, videos, and logs from users. You want to analyze this data to find useful information, but your computer cannot handle such a big load.This is a common problem called "big data". Hadoop was created to solve this problem by helping store and process very large data sets across many computers.
🎯 Goal: In this project, you will create a simple Python dictionary to represent big data storage needs, set a threshold for data size, filter data sets that are too big for a single computer, and finally print the filtered data sets. This will help you understand why Hadoop was created to handle big data.
📋 What You'll Learn
Create a dictionary called data_sizes with exact keys and values representing data set names and their sizes in terabytes.
Create a variable called max_single_node_size to represent the maximum data size a single computer can handle.
Use a dictionary comprehension to create a new dictionary called big_data_sets that only includes data sets larger than max_single_node_size.
Print the big_data_sets dictionary.
💡 Why This Matters
🌍 Real World
Companies like social media platforms, banks, and online stores collect huge amounts of data daily. Hadoop helps them store and analyze this big data by splitting it across many computers.
💼 Career
Understanding why Hadoop was created helps data scientists and engineers design systems that can handle large-scale data processing efficiently.
Progress0 / 4 steps
1
Create the data sizes dictionary
Create a dictionary called data_sizes with these exact entries: 'user_logs': 5, 'video_files': 50, 'image_collections': 20, 'sensor_data': 2, 'transaction_records': 15. The values represent data sizes in terabytes.
Hadoop
Need a hint?

Use curly braces {} to create the dictionary with the exact keys and values.

2
Set the maximum single node data size
Create a variable called max_single_node_size and set it to 10. This represents the maximum data size in terabytes that a single computer can handle.
Hadoop
Need a hint?

Just assign the number 10 to the variable max_single_node_size.

3
Filter big data sets using dictionary comprehension
Use a dictionary comprehension to create a new dictionary called big_data_sets that includes only the entries from data_sizes where the size is greater than max_single_node_size. Use for dataset, size in data_sizes.items() in your comprehension.
Hadoop
Need a hint?

Use dictionary comprehension syntax: {key: value for key, value in dict.items() if condition}.

4
Print the big data sets
Print the big_data_sets dictionary to display the data sets that are too big for a single computer.
Hadoop
Need a hint?

Use print(big_data_sets) to show the filtered dictionary.