How to add node to hadoop cluster

HadoopHow-ToBeginner · 4 min read

How to Add a Node to a Hadoop Cluster: Step-by-Step Guide

To add a new node to a Hadoop cluster, first install Hadoop on the new machine and configure its core-site.xml, hdfs-site.xml, and yarn-site.xml files to match the cluster settings. Then, update the slaves file on the master node to include the new node's hostname and restart the Hadoop services to apply changes.

📐

Syntax

Adding a node involves updating configuration files and restarting services. Key steps include:

Install Hadoop on the new node.
Configure XML files: core-site.xml, hdfs-site.xml, yarn-site.xml.
Update slaves file on the master node to list the new node.
Restart Hadoop services to recognize the new node.

bash

# On new node: install Hadoop and configure XML files
# On master node: add new node hostname to slaves file
# Restart Hadoop services

# Example commands:
ssh new-node "sudo apt-get install hadoop"
# Copy config files to new node
scp core-site.xml hdfs-site.xml yarn-site.xml new-node:/etc/hadoop/

# On master node:
echo "new-node-hostname" >> $HADOOP_HOME/etc/hadoop/slaves

# Restart services
$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/stop-yarn.sh
$HADOOP_HOME/sbin/start-yarn.sh

💻

Example

This example shows how to add a node named node3 to an existing Hadoop cluster.

bash

# Step 1: On node3, install Hadoop and configure files
sudo apt-get update && sudo apt-get install -y hadoop
scp master:/etc/hadoop/core-site.xml /etc/hadoop/
scp master:/etc/hadoop/hdfs-site.xml /etc/hadoop/
scp master:/etc/hadoop/yarn-site.xml /etc/hadoop/

# Step 2: On master node, add node3 to slaves file
echo "node3" >> $HADOOP_HOME/etc/hadoop/slaves

# Step 3: Restart Hadoop services on master node
$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/stop-yarn.sh
$HADOOP_HOME/sbin/start-yarn.sh

# Step 4: Verify node3 is added
hdfs dfsadmin -report

Output

Configured Capacity: 1000 GB DFS Used: 200 GB Non DFS Used: 50 GB DFS Remaining: 750 GB Name: node1:50010 (Decommissioned: false) Name: node2:50010 (Decommissioned: false) Name: node3:50010 (Decommissioned: false) Live datanodes (3): node1, node2, node3

⚠️

Common Pitfalls

Not copying the exact configuration files to the new node causes mismatched settings.
Forgetting to add the new node's hostname to the slaves file on the master node.
Not restarting Hadoop services after changes, so the cluster does not recognize the new node.
Network or SSH issues preventing communication between master and new node.

bash

# Wrong: Forgetting to update slaves file
# On master node:
# (No change made)

# Right: Add new node hostname
echo "node3" >> $HADOOP_HOME/etc/hadoop/slaves

# Then restart services
$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/start-dfs.sh

📊

Quick Reference

Summary tips for adding a node to Hadoop cluster:

Install Hadoop on the new node with matching version.
Copy master node's configuration files to the new node.
Add new node hostname to slaves file on master.
Restart HDFS and YARN services on master node.
Verify new node is live using hdfs dfsadmin -report.

✅

Key Takeaways

Install and configure Hadoop on the new node with the same settings as the cluster.

Add the new node's hostname to the master node's slaves file to include it in the cluster.

Restart Hadoop services after configuration changes to apply the new node.

Verify the new node is active using Hadoop's dfsadmin report command.

Ensure network connectivity and SSH access between master and new node.