0
0
Hadoopdata~30 mins

Backup and disaster recovery in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Backup and Disaster Recovery in Hadoop
📖 Scenario: You work as a data engineer managing a Hadoop cluster that stores important company data. To protect against data loss from hardware failure or accidental deletion, you need to create a backup of critical files and set up a simple disaster recovery plan.
🎯 Goal: Build a basic Hadoop backup and disaster recovery script that copies important files from the main Hadoop Distributed File System (HDFS) directory to a backup directory. Then, verify the backup by listing the files in the backup location.
📋 What You'll Learn
Create a list of important HDFS file paths to back up
Set a backup directory path variable
Write a loop to copy each important file to the backup directory using Hadoop commands
Print the list of files in the backup directory to confirm backup success
💡 Why This Matters
🌍 Real World
Backing up data in Hadoop is critical to prevent data loss from hardware failures or accidental deletions. This project simulates a simple backup and recovery process.
💼 Career
Data engineers and Hadoop administrators regularly create backup and disaster recovery plans to ensure data safety and business continuity.
Progress0 / 4 steps
1
Create a list of important HDFS files
Create a list called important_files with these exact HDFS file paths: /data/sales.csv, /data/customers.csv, and /data/products.csv.
Hadoop
Need a hint?

Use square brackets to create a list and include the file paths as strings.

2
Set the backup directory path
Create a variable called backup_dir and set it to the string /backup/ which will be the HDFS backup directory.
Hadoop
Need a hint?

Assign the string '/backup/' to the variable backup_dir.

3
Copy important files to the backup directory
Write a for loop using the variable file_path to iterate over important_files. Inside the loop, use the Hadoop command hadoop fs -cp to copy each file_path to backup_dir. Use an f-string to build the command string exactly as: hadoop fs -cp {file_path} {backup_dir}. Use os.system() to run the command.
Hadoop
Need a hint?

Use a for loop with file_path and build the copy command with an f-string. Then run it with os.system().

4
List files in the backup directory to verify
Use os.system() to run the Hadoop command hadoop fs -ls {backup_dir} to list the files in the backup directory and confirm the backup.
Hadoop
Need a hint?

Use os.system() with the list command to show files in the backup directory.