Hadoopdata~30 mins

Why Hadoop was created for big data - See It in Action

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Understanding Why Hadoop Was Created for Big Data

📖 Scenario: Imagine you work at a company that collects a huge amount of data every day, like millions of photos, videos, and logs from users. You want to analyze this data to find useful information, but your computer cannot handle such a big load.This is a common problem called "big data". Hadoop was created to solve this problem by helping store and process very large data sets across many computers.

🎯 Goal: In this project, you will create a simple Python dictionary to represent big data storage needs, set a threshold for data size, filter data sets that are too big for a single computer, and finally print the filtered data sets. This will help you understand why Hadoop was created to handle big data.

📋 What You'll Learn

Create a dictionary called data_sizes with exact keys and values representing data set names and their sizes in terabytes.

Create a variable called max_single_node_size to represent the maximum data size a single computer can handle.

Use a dictionary comprehension to create a new dictionary called big_data_sets that only includes data sets larger than max_single_node_size.

Print the big_data_sets dictionary.

💡 Why This Matters

🌍 Real World

Companies like social media platforms, banks, and online stores collect huge amounts of data daily. Hadoop helps them store and analyze this big data by splitting it across many computers.

💼 Career

Understanding why Hadoop was created helps data scientists and engineers design systems that can handle large-scale data processing efficiently.

Progress0 / 4 steps

Create the data sizes dictionary

Create a dictionary called data_sizes with these exact entries: 'user_logs': 5, 'video_files': 50, 'image_collections': 20, 'sensor_data': 2, 'transaction_records': 15. The values represent data sizes in terabytes.

Hadoop

# Create the data_sizes dictionary with given entries
# Your code here

Need a hint?

Use curly braces {} to create the dictionary with the exact keys and values.

Set the maximum single node data size

Create a variable called max_single_node_size and set it to 10. This represents the maximum data size in terabytes that a single computer can handle.

Hadoop

data_sizes = {'user_logs': 5, 'video_files': 50, 'image_collections': 20, 'sensor_data': 2, 'transaction_records': 15}
# Set max_single_node_size to 10
# Your code here

Need a hint?

Just assign the number 10 to the variable max_single_node_size.

Filter big data sets using dictionary comprehension

Use a dictionary comprehension to create a new dictionary called big_data_sets that includes only the entries from data_sizes where the size is greater than max_single_node_size. Use for dataset, size in data_sizes.items() in your comprehension.

Hadoop

data_sizes = {'user_logs': 5, 'video_files': 50, 'image_collections': 20, 'sensor_data': 2, 'transaction_records': 15}
max_single_node_size = 10
# Create big_data_sets dictionary with data sizes greater than max_single_node_size
# Your code here

Need a hint?

Use dictionary comprehension syntax: {key: value for key, value in dict.items() if condition}.

Print the big data sets

Print the big_data_sets dictionary to display the data sets that are too big for a single computer.

Hadoop

data_sizes = {'user_logs': 5, 'video_files': 50, 'image_collections': 20, 'sensor_data': 2, 'transaction_records': 15}
max_single_node_size = 10
big_data_sets = {dataset: size for dataset, size in data_sizes.items() if size > max_single_node_size}
# Print the big_data_sets dictionary
# Your code here

Need a hint?

Use print(big_data_sets) to show the filtered dictionary.