0
0
HadoopHow-ToBeginner ยท 4 min read

How to Start a Hadoop Cluster: Step-by-Step Guide

To start a Hadoop cluster, you need to start the NameNode and DataNode services on the master and worker nodes respectively. Use the commands sbin/start-dfs.sh to start HDFS services and sbin/start-yarn.sh to start resource management services.
๐Ÿ“

Syntax

Starting a Hadoop cluster involves running shell scripts that launch the core services. The main commands are:

  • sbin/start-dfs.sh: Starts the Hadoop Distributed File System (HDFS) daemons including NameNode and DataNodes.
  • sbin/start-yarn.sh: Starts the YARN daemons including ResourceManager and NodeManagers for resource management.
  • sbin/stop-dfs.sh and sbin/stop-yarn.sh: Used to stop these services.

These scripts are located in the sbin directory of your Hadoop installation.

bash
sbin/start-dfs.sh
sbin/start-yarn.sh
๐Ÿ’ป

Example

This example shows how to start a Hadoop cluster on a single machine (pseudo-distributed mode). It starts HDFS and YARN services and checks their status.

bash
# Navigate to Hadoop installation directory
cd $HADOOP_HOME

# Start HDFS services (NameNode and DataNode)
sbin/start-dfs.sh

# Start YARN services (ResourceManager and NodeManager)
sbin/start-yarn.sh

# Check running Java processes related to Hadoop
jps
Output
NameNode DataNode ResourceManager NodeManager SecondaryNameNode
โš ๏ธ

Common Pitfalls

Common mistakes when starting a Hadoop cluster include:

  • Not formatting the NameNode before starting the cluster. You must run hdfs namenode -format once before the first start.
  • Incorrect environment variables like JAVA_HOME or HADOOP_HOME not set properly.
  • Firewall or network issues blocking communication between nodes.
  • Trying to start services without proper SSH setup for multi-node clusters.

Always check logs in $HADOOP_HOME/logs for errors if services fail to start.

bash
## Wrong: Starting cluster without formatting NameNode
sbin/start-dfs.sh

## Right: Format NameNode first
hdfs namenode -format
sbin/start-dfs.sh
๐Ÿ“Š

Quick Reference

CommandPurpose
hdfs namenode -formatFormat the NameNode before first start
sbin/start-dfs.shStart HDFS daemons (NameNode, DataNodes)
sbin/start-yarn.shStart YARN daemons (ResourceManager, NodeManagers)
sbin/stop-dfs.shStop HDFS daemons
sbin/stop-yarn.shStop YARN daemons
jpsCheck running Hadoop Java processes
โœ…

Key Takeaways

Always format the NameNode once before starting the Hadoop cluster.
Use start-dfs.sh and start-yarn.sh scripts to launch core Hadoop services.
Check running services with the jps command to confirm cluster startup.
Ensure environment variables and SSH setup are correctly configured for multi-node clusters.
Review Hadoop logs for troubleshooting if services fail to start.