0
0
Hadoopdata~30 mins

Why HDFS handles petabyte-scale storage in Hadoop - See It in Action

Choose your learning style9 modes available
Why HDFS Handles Petabyte-Scale Storage
📖 Scenario: Imagine you work for a company that collects huge amounts of data every day, like videos, logs, and sensor readings. You need a way to store all this data safely and access it quickly, even when it grows to petabytes (millions of gigabytes).
🎯 Goal: You will create a simple simulation to understand how HDFS (Hadoop Distributed File System) manages very large data by splitting it into blocks and storing copies across many machines.
📋 What You'll Learn
Create a dictionary to represent files and their sizes in gigabytes
Set a block size variable to split files into blocks
Calculate how many blocks each file needs
Print the number of blocks per file to see how HDFS handles large data
💡 Why This Matters
🌍 Real World
Big companies like Netflix and Facebook store massive data using HDFS to keep it safe and accessible.
💼 Career
Understanding HDFS block management is key for roles in big data engineering and data infrastructure.
Progress0 / 4 steps
1
Create a dictionary of files with their sizes
Create a dictionary called files with these exact entries: 'video1.mp4': 1500, 'log_data.txt': 3000, 'sensor_readings.csv': 500. The sizes are in gigabytes.
Hadoop
Need a hint?

Use curly braces {} to create a dictionary with file names as keys and sizes as values.

2
Set the block size for splitting files
Create a variable called block_size_gb and set it to 128 to represent the block size in gigabytes.
Hadoop
Need a hint?

Just assign the number 128 to the variable block_size_gb.

3
Calculate the number of blocks for each file
Create a new dictionary called file_blocks. Use a for loop with variables file_name and size to iterate over files.items(). For each file, calculate the number of blocks by dividing size by block_size_gb and rounding up using math.ceil(). Store the result in file_blocks[file_name]. Import the math module first.
Hadoop
Need a hint?

Use math.ceil() to round up the division result so partial blocks count as full blocks.

4
Print the number of blocks per file
Use a for loop with variables file_name and blocks to iterate over file_blocks.items(). Print each file name and its number of blocks in the format: File: video1.mp4, Blocks: 12.
Hadoop
Need a hint?

Use an f-string inside the print statement to format the output exactly as shown.