HDFS high availability helps keep your data safe and accessible even if one part of the system fails. It stops your system from going offline.
HDFS high availability in Hadoop
1. Set up two NameNodes: one active and one standby. 2. Configure a shared storage called JournalNode for edit logs. 3. Use ZooKeeper to manage automatic failover between NameNodes. 4. Update hdfs-site.xml with HA settings including dfs.nameservices, dfs.ha.namenodes, dfs.namenode.rpc-address, dfs.namenode.http-address, dfs.namenode.shared.edits.dir, and dfs.client.failover.proxy.provider. 5. Start JournalNodes, ZooKeeper, and both NameNodes. 6. Use hdfs commands to check the status and switch active NameNode if needed.
This setup requires careful configuration of multiple components.
ZooKeeper helps decide which NameNode is active to avoid conflicts.
# Example hdfs-site.xml snippet for HA <configuration> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>host1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>host2:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>host1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>host2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://host1:8485;host2:8485;host3:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> </configuration>
# Edge case: Only one NameNode configured (no HA)
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1</value>
</property>
</configuration># Edge case: Standby NameNode is down # The active NameNode continues serving data. # Failover can be manual or automatic if ZooKeeper is configured.
This script shows how to check which NameNode is active or standby and how to switch active NameNode manually.
# This is a shell script example to check HDFS HA status # Check the status of NameNodes hdfs haadmin -getServiceState nn1 hdfs haadmin -getServiceState nn2 # Manually failover from nn1 to nn2 hdfs haadmin -failover nn1 nn2 # Check status again hdfs haadmin -getServiceState nn1 hdfs haadmin -getServiceState nn2
Time complexity: HA setup adds some overhead but keeps system responsive.
Space complexity: Requires extra storage for shared edits and standby NameNode.
Common mistake: Not configuring ZooKeeper correctly can cause failover to fail.
Use HA when you need zero downtime. For small clusters, simple single NameNode may be enough.
HDFS high availability uses two NameNodes to avoid downtime.
Shared storage and ZooKeeper help manage failover automatically.
Proper configuration is key to keep data safe and accessible.