HadoopHow-ToBeginner · 4 min read

How to Install Hadoop: Step-by-Step Guide for Beginners

To install Hadoop, first download the latest stable release from the official Apache website, then extract it and configure environment variables like HADOOP_HOME. Finally, format the Hadoop filesystem and start the Hadoop services using start-dfs.sh and start-yarn.sh scripts.

📐

Syntax

Installing Hadoop involves these main steps:

Download: Get the Hadoop binary from the official Apache site.
Extract: Unpack the downloaded archive to a chosen directory.
Configure: Set environment variables like HADOOP_HOME and update PATH.
Format: Initialize the Hadoop filesystem with hdfs namenode -format.
Start: Launch Hadoop daemons using start-dfs.sh and start-yarn.sh.

bash

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

tar -xzf hadoop-3.3.6.tar.gz

export HADOOP_HOME=/path/to/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

hdfs namenode -format

start-dfs.sh
start-yarn.sh

💻

Example

This example shows how to install Hadoop 3.3.6 on a Linux system, set environment variables, format the namenode, and start Hadoop services.

bash

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

# Extract the archive

tar -xzf hadoop-3.3.6.tar.gz

# Set environment variables (replace /home/user with your path)

export HADOOP_HOME=/home/user/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# Format the Hadoop filesystem

hdfs namenode -format

# Start Hadoop daemons

start-dfs.sh
start-yarn.sh

# Check running Java processes

jps

Output

Downloading hadoop-3.3.6.tar.gz... Extracting archive... Setting environment variables... Formatting namenode: SUCCESS Starting HDFS daemons... Starting YARN daemons... jps output: 12345 NameNode 12346 DataNode 12347 ResourceManager 12348 NodeManager 12349 SecondaryNameNode

⚠️

Common Pitfalls

Common mistakes when installing Hadoop include:

Not setting JAVA_HOME environment variable, which Hadoop needs to run.
Forgetting to format the namenode before starting services, causing startup errors.
Incorrect permissions on Hadoop directories, leading to access errors.
Not updating PATH to include Hadoop binaries, so commands are not found.

Always verify environment variables and directory permissions before starting Hadoop.

bash

### Wrong: Starting Hadoop without formatting namenode
start-dfs.sh

### Right: Format first, then start
hdfs namenode -format
start-dfs.sh

📊

Quick Reference

Summary tips for installing Hadoop:

Download the latest stable Hadoop release from Apache Hadoop Releases.
Set JAVA_HOME and HADOOP_HOME environment variables correctly.
Format the namenode before starting Hadoop services.
Use jps command to check running Hadoop daemons.
Ensure proper permissions on Hadoop installation and data directories.

✅

Key Takeaways

Download and extract the latest Hadoop release from the official Apache site.

Set environment variables like JAVA_HOME and HADOOP_HOME before running Hadoop.

Always format the namenode with hdfs namenode -format before starting services.

Start Hadoop daemons using start-dfs.sh and start-yarn.sh scripts.

Use jps to verify that Hadoop processes are running correctly.