How to Install Hadoop: Step-by-Step Guide for Beginners
To install
Hadoop, first download the latest stable release from the official Apache website, then extract it and configure environment variables like HADOOP_HOME. Finally, format the Hadoop filesystem and start the Hadoop services using start-dfs.sh and start-yarn.sh scripts.Syntax
Installing Hadoop involves these main steps:
- Download: Get the Hadoop binary from the official Apache site.
- Extract: Unpack the downloaded archive to a chosen directory.
- Configure: Set environment variables like
HADOOP_HOMEand updatePATH. - Format: Initialize the Hadoop filesystem with
hdfs namenode -format. - Start: Launch Hadoop daemons using
start-dfs.shandstart-yarn.sh.
bash
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz tar -xzf hadoop-3.3.6.tar.gz export HADOOP_HOME=/path/to/hadoop-3.3.6 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin hdfs namenode -format start-dfs.sh start-yarn.sh
Example
This example shows how to install Hadoop 3.3.6 on a Linux system, set environment variables, format the namenode, and start Hadoop services.
bash
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz # Extract the archive tar -xzf hadoop-3.3.6.tar.gz # Set environment variables (replace /home/user with your path) export HADOOP_HOME=/home/user/hadoop-3.3.6 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin # Format the Hadoop filesystem hdfs namenode -format # Start Hadoop daemons start-dfs.sh start-yarn.sh # Check running Java processes jps
Output
Downloading hadoop-3.3.6.tar.gz...
Extracting archive...
Setting environment variables...
Formatting namenode: SUCCESS
Starting HDFS daemons...
Starting YARN daemons...
jps output:
12345 NameNode
12346 DataNode
12347 ResourceManager
12348 NodeManager
12349 SecondaryNameNode
Common Pitfalls
Common mistakes when installing Hadoop include:
- Not setting
JAVA_HOMEenvironment variable, which Hadoop needs to run. - Forgetting to format the namenode before starting services, causing startup errors.
- Incorrect permissions on Hadoop directories, leading to access errors.
- Not updating
PATHto include Hadoop binaries, so commands are not found.
Always verify environment variables and directory permissions before starting Hadoop.
bash
### Wrong: Starting Hadoop without formatting namenode start-dfs.sh ### Right: Format first, then start hdfs namenode -format start-dfs.sh
Quick Reference
Summary tips for installing Hadoop:
- Download the latest stable Hadoop release from Apache Hadoop Releases.
- Set
JAVA_HOMEandHADOOP_HOMEenvironment variables correctly. - Format the namenode before starting Hadoop services.
- Use
jpscommand to check running Hadoop daemons. - Ensure proper permissions on Hadoop installation and data directories.
Key Takeaways
Download and extract the latest Hadoop release from the official Apache site.
Set environment variables like JAVA_HOME and HADOOP_HOME before running Hadoop.
Always format the namenode with hdfs namenode -format before starting services.
Start Hadoop daemons using start-dfs.sh and start-yarn.sh scripts.
Use jps to verify that Hadoop processes are running correctly.