How to Start a Hadoop Cluster: Step-by-Step Guide
To start a
Hadoop cluster, you need to start the NameNode and DataNode services on the master and worker nodes respectively. Use the commands sbin/start-dfs.sh to start HDFS services and sbin/start-yarn.sh to start resource management services.Syntax
Starting a Hadoop cluster involves running shell scripts that launch the core services. The main commands are:
sbin/start-dfs.sh: Starts the Hadoop Distributed File System (HDFS) daemons including NameNode and DataNodes.sbin/start-yarn.sh: Starts the YARN daemons including ResourceManager and NodeManagers for resource management.sbin/stop-dfs.shandsbin/stop-yarn.sh: Used to stop these services.
These scripts are located in the sbin directory of your Hadoop installation.
bash
sbin/start-dfs.sh sbin/start-yarn.sh
Example
This example shows how to start a Hadoop cluster on a single machine (pseudo-distributed mode). It starts HDFS and YARN services and checks their status.
bash
# Navigate to Hadoop installation directory cd $HADOOP_HOME # Start HDFS services (NameNode and DataNode) sbin/start-dfs.sh # Start YARN services (ResourceManager and NodeManager) sbin/start-yarn.sh # Check running Java processes related to Hadoop jps
Output
NameNode
DataNode
ResourceManager
NodeManager
SecondaryNameNode
Common Pitfalls
Common mistakes when starting a Hadoop cluster include:
- Not formatting the NameNode before starting the cluster. You must run
hdfs namenode -formatonce before the first start. - Incorrect environment variables like
JAVA_HOMEorHADOOP_HOMEnot set properly. - Firewall or network issues blocking communication between nodes.
- Trying to start services without proper SSH setup for multi-node clusters.
Always check logs in $HADOOP_HOME/logs for errors if services fail to start.
bash
## Wrong: Starting cluster without formatting NameNode sbin/start-dfs.sh ## Right: Format NameNode first hdfs namenode -format sbin/start-dfs.sh
Quick Reference
| Command | Purpose |
|---|---|
| hdfs namenode -format | Format the NameNode before first start |
| sbin/start-dfs.sh | Start HDFS daemons (NameNode, DataNodes) |
| sbin/start-yarn.sh | Start YARN daemons (ResourceManager, NodeManagers) |
| sbin/stop-dfs.sh | Stop HDFS daemons |
| sbin/stop-yarn.sh | Stop YARN daemons |
| jps | Check running Hadoop Java processes |
Key Takeaways
Always format the NameNode once before starting the Hadoop cluster.
Use start-dfs.sh and start-yarn.sh scripts to launch core Hadoop services.
Check running services with the jps command to confirm cluster startup.
Ensure environment variables and SSH setup are correctly configured for multi-node clusters.
Review Hadoop logs for troubleshooting if services fail to start.