0
0
Hadoopdata~30 mins

Cluster planning and sizing in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Cluster Planning and Sizing
📖 Scenario: You are working as a data engineer preparing to set up a Hadoop cluster for a company. The company needs to process large amounts of data efficiently. To do this, you must plan the cluster size based on the data volume and processing needs.
🎯 Goal: Build a simple program to calculate the number of nodes needed in a Hadoop cluster based on data size and node capacity.
📋 What You'll Learn
Create a variable with the total data size in terabytes (TB)
Create a variable with the capacity of one node in terabytes (TB)
Calculate the number of nodes needed using division and rounding up
Print the number of nodes required
💡 Why This Matters
🌍 Real World
Planning the size of a Hadoop cluster helps companies allocate resources efficiently and avoid under or over-provisioning.
💼 Career
Data engineers and system administrators use cluster sizing to ensure data processing runs smoothly and cost-effectively.
Progress0 / 4 steps
1
Set up data size and node capacity
Create a variable called total_data_tb and set it to 120 to represent total data size in terabytes. Also create a variable called node_capacity_tb and set it to 10 to represent the capacity of one node in terabytes.
Hadoop
Need a hint?

Use simple assignment to create variables with the exact names and values.

2
Add a configuration variable for safety margin
Create a variable called safety_margin and set it to 1.2 to add 20% extra capacity for safety.
Hadoop
Need a hint?

This variable will help us plan for extra capacity beyond the raw data size.

3
Calculate the number of nodes needed
Calculate the number of nodes needed by multiplying total_data_tb by safety_margin, then dividing by node_capacity_tb. Use the math.ceil() function to round up the result. Store the result in a variable called nodes_needed. Import the math module first.
Hadoop
Need a hint?

Use import math at the top. Then calculate and round up the nodes needed.

4
Print the number of nodes required
Print the text "Number of nodes needed:" followed by the value of nodes_needed.
Hadoop
Need a hint?

Use a print statement with a comma to print the label and the variable.