How to Add a Node to a Hadoop Cluster: Step-by-Step Guide
To add a new node to a Hadoop cluster, first install Hadoop on the new machine and configure its
core-site.xml, hdfs-site.xml, and yarn-site.xml files to match the cluster settings. Then, update the slaves file on the master node to include the new node's hostname and restart the Hadoop services to apply changes.Syntax
Adding a node involves updating configuration files and restarting services. Key steps include:
- Install Hadoop on the new node.
- Configure XML files:
core-site.xml,hdfs-site.xml,yarn-site.xml. - Update
slavesfile on the master node to list the new node. - Restart Hadoop services to recognize the new node.
bash
# On new node: install Hadoop and configure XML files # On master node: add new node hostname to slaves file # Restart Hadoop services # Example commands: ssh new-node "sudo apt-get install hadoop" # Copy config files to new node scp core-site.xml hdfs-site.xml yarn-site.xml new-node:/etc/hadoop/ # On master node: echo "new-node-hostname" >> $HADOOP_HOME/etc/hadoop/slaves # Restart services $HADOOP_HOME/sbin/stop-dfs.sh $HADOOP_HOME/sbin/start-dfs.sh $HADOOP_HOME/sbin/stop-yarn.sh $HADOOP_HOME/sbin/start-yarn.sh
Example
This example shows how to add a node named node3 to an existing Hadoop cluster.
bash
# Step 1: On node3, install Hadoop and configure files sudo apt-get update && sudo apt-get install -y hadoop scp master:/etc/hadoop/core-site.xml /etc/hadoop/ scp master:/etc/hadoop/hdfs-site.xml /etc/hadoop/ scp master:/etc/hadoop/yarn-site.xml /etc/hadoop/ # Step 2: On master node, add node3 to slaves file echo "node3" >> $HADOOP_HOME/etc/hadoop/slaves # Step 3: Restart Hadoop services on master node $HADOOP_HOME/sbin/stop-dfs.sh $HADOOP_HOME/sbin/start-dfs.sh $HADOOP_HOME/sbin/stop-yarn.sh $HADOOP_HOME/sbin/start-yarn.sh # Step 4: Verify node3 is added hdfs dfsadmin -report
Output
Configured Capacity: 1000 GB
DFS Used: 200 GB
Non DFS Used: 50 GB
DFS Remaining: 750 GB
Name: node1:50010 (Decommissioned: false)
Name: node2:50010 (Decommissioned: false)
Name: node3:50010 (Decommissioned: false)
Live datanodes (3): node1, node2, node3
Common Pitfalls
- Not copying the exact configuration files to the new node causes mismatched settings.
- Forgetting to add the new node's hostname to the
slavesfile on the master node. - Not restarting Hadoop services after changes, so the cluster does not recognize the new node.
- Network or SSH issues preventing communication between master and new node.
bash
# Wrong: Forgetting to update slaves file # On master node: # (No change made) # Right: Add new node hostname echo "node3" >> $HADOOP_HOME/etc/hadoop/slaves # Then restart services $HADOOP_HOME/sbin/stop-dfs.sh $HADOOP_HOME/sbin/start-dfs.sh
Quick Reference
Summary tips for adding a node to Hadoop cluster:
- Install Hadoop on the new node with matching version.
- Copy master node's configuration files to the new node.
- Add new node hostname to
slavesfile on master. - Restart HDFS and YARN services on master node.
- Verify new node is live using
hdfs dfsadmin -report.
Key Takeaways
Install and configure Hadoop on the new node with the same settings as the cluster.
Add the new node's hostname to the master node's slaves file to include it in the cluster.
Restart Hadoop services after configuration changes to apply the new node.
Verify the new node is active using Hadoop's dfsadmin report command.
Ensure network connectivity and SSH access between master and new node.