How to start hadoop cluster

HadoopHow-ToBeginner · 4 min read

How to Start a Hadoop Cluster: Step-by-Step Guide

To start a Hadoop cluster, you need to start the NameNode and DataNode services on the master and worker nodes respectively. Use the commands sbin/start-dfs.sh to start HDFS services and sbin/start-yarn.sh to start resource management services.

📐

Syntax

Starting a Hadoop cluster involves running shell scripts that launch the core services. The main commands are:

sbin/start-dfs.sh: Starts the Hadoop Distributed File System (HDFS) daemons including NameNode and DataNodes.
sbin/start-yarn.sh: Starts the YARN daemons including ResourceManager and NodeManagers for resource management.
sbin/stop-dfs.sh and sbin/stop-yarn.sh: Used to stop these services.

These scripts are located in the sbin directory of your Hadoop installation.

bash

sbin/start-dfs.sh
sbin/start-yarn.sh

💻

Example

This example shows how to start a Hadoop cluster on a single machine (pseudo-distributed mode). It starts HDFS and YARN services and checks their status.

bash

# Navigate to Hadoop installation directory
cd $HADOOP_HOME

# Start HDFS services (NameNode and DataNode)
sbin/start-dfs.sh

# Start YARN services (ResourceManager and NodeManager)
sbin/start-yarn.sh

# Check running Java processes related to Hadoop
jps

Output

NameNode DataNode ResourceManager NodeManager SecondaryNameNode

⚠️

Common Pitfalls

Common mistakes when starting a Hadoop cluster include:

Not formatting the NameNode before starting the cluster. You must run hdfs namenode -format once before the first start.
Incorrect environment variables like JAVA_HOME or HADOOP_HOME not set properly.
Firewall or network issues blocking communication between nodes.
Trying to start services without proper SSH setup for multi-node clusters.

Always check logs in $HADOOP_HOME/logs for errors if services fail to start.

bash

## Wrong: Starting cluster without formatting NameNode
sbin/start-dfs.sh

## Right: Format NameNode first
hdfs namenode -format
sbin/start-dfs.sh

📊

Quick Reference

Command	Purpose
hdfs namenode -format	Format the NameNode before first start
sbin/start-dfs.sh	Start HDFS daemons (NameNode, DataNodes)
sbin/start-yarn.sh	Start YARN daemons (ResourceManager, NodeManagers)
sbin/stop-dfs.sh	Stop HDFS daemons
sbin/stop-yarn.sh	Stop YARN daemons
jps	Check running Hadoop Java processes

✅

Key Takeaways

Always format the NameNode once before starting the Hadoop cluster.

Use start-dfs.sh and start-yarn.sh scripts to launch core Hadoop services.

Check running services with the jps command to confirm cluster startup.

Ensure environment variables and SSH setup are correctly configured for multi-node clusters.

Review Hadoop logs for troubleshooting if services fail to start.