0
0
HadoopHow-ToBeginner ยท 4 min read

How to Install Hadoop: Step-by-Step Guide for Beginners

To install Hadoop, first download the latest stable release from the official Apache website, then extract it and configure environment variables like HADOOP_HOME. Finally, format the Hadoop filesystem and start the Hadoop services using start-dfs.sh and start-yarn.sh scripts.
๐Ÿ“

Syntax

Installing Hadoop involves these main steps:

  • Download: Get the Hadoop binary from the official Apache site.
  • Extract: Unpack the downloaded archive to a chosen directory.
  • Configure: Set environment variables like HADOOP_HOME and update PATH.
  • Format: Initialize the Hadoop filesystem with hdfs namenode -format.
  • Start: Launch Hadoop daemons using start-dfs.sh and start-yarn.sh.
bash
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

tar -xzf hadoop-3.3.6.tar.gz

export HADOOP_HOME=/path/to/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

hdfs namenode -format

start-dfs.sh
start-yarn.sh
๐Ÿ’ป

Example

This example shows how to install Hadoop 3.3.6 on a Linux system, set environment variables, format the namenode, and start Hadoop services.

bash
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

# Extract the archive

tar -xzf hadoop-3.3.6.tar.gz

# Set environment variables (replace /home/user with your path)

export HADOOP_HOME=/home/user/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# Format the Hadoop filesystem

hdfs namenode -format

# Start Hadoop daemons

start-dfs.sh
start-yarn.sh

# Check running Java processes

jps
Output
Downloading hadoop-3.3.6.tar.gz... Extracting archive... Setting environment variables... Formatting namenode: SUCCESS Starting HDFS daemons... Starting YARN daemons... jps output: 12345 NameNode 12346 DataNode 12347 ResourceManager 12348 NodeManager 12349 SecondaryNameNode
โš ๏ธ

Common Pitfalls

Common mistakes when installing Hadoop include:

  • Not setting JAVA_HOME environment variable, which Hadoop needs to run.
  • Forgetting to format the namenode before starting services, causing startup errors.
  • Incorrect permissions on Hadoop directories, leading to access errors.
  • Not updating PATH to include Hadoop binaries, so commands are not found.

Always verify environment variables and directory permissions before starting Hadoop.

bash
### Wrong: Starting Hadoop without formatting namenode
start-dfs.sh

### Right: Format first, then start
hdfs namenode -format
start-dfs.sh
๐Ÿ“Š

Quick Reference

Summary tips for installing Hadoop:

  • Download the latest stable Hadoop release from Apache Hadoop Releases.
  • Set JAVA_HOME and HADOOP_HOME environment variables correctly.
  • Format the namenode before starting Hadoop services.
  • Use jps command to check running Hadoop daemons.
  • Ensure proper permissions on Hadoop installation and data directories.
โœ…

Key Takeaways

Download and extract the latest Hadoop release from the official Apache site.
Set environment variables like JAVA_HOME and HADOOP_HOME before running Hadoop.
Always format the namenode with hdfs namenode -format before starting services.
Start Hadoop daemons using start-dfs.sh and start-yarn.sh scripts.
Use jps to verify that Hadoop processes are running correctly.