Challenge - 5 Problems

🎖️

Hadoop Backup and Recovery Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Hadoop Backup Strategies

Which of the following best describes the primary purpose of Hadoop's DistCp tool in backup and disaster recovery?

AIt compresses Hadoop data files to save storage space during backup.

BIt copies large amounts of data efficiently between Hadoop clusters for backup or migration.

CIt monitors Hadoop cluster health to prevent data loss.

DIt encrypts Hadoop data before backup to ensure security.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

HDFS Snapshot Output

What will be the output of the following HDFS command sequence?

hdfs dfs -mkdir /data
hdfs dfs -put file1.txt /data/
hdfs dfs -createSnapshot /data snap1
hdfs dfs -rm /data/file1.txt
hdfs dfs -ls /data/.snapshot/snap1

ALists all files except file1.txt in the snapshot.

BShows an empty directory because file1.txt was deleted.

CLists file1.txt inside the snapshot directory.

DReturns an error because snapshots cannot be created on /data.

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Analyzing Backup Data Size Reduction

You run a Hadoop backup job that uses compression. The original data size is 500 GB. After backup, the compressed backup size is 150 GB. What is the compression ratio?

A3.33

B0.3

C350

D0.03

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying Error in Hadoop Backup Script

Consider this Hadoop backup script snippet:

hdfs dfs -mkdir /backup
hadoop distcp /user/data /backup/data_backup
hdfs dfs -rm -r /user/data

What is the main risk or error in this script?

AThe rm command will fail because /user/data is not empty.

BThe distcp command syntax is incorrect and will cause a syntax error.

CThe backup directory /backup does not exist before copying.

DThe script deletes original data immediately after copying without verifying backup success.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Designing a Disaster Recovery Plan for Hadoop

You are tasked with designing a disaster recovery plan for a Hadoop cluster that must minimize downtime and data loss. Which combination of strategies is best?

AUse HDFS snapshots regularly and replicate data to a remote cluster using DistCp.

BOnly rely on HDFS replication factor set to 3 within the same cluster.

CBackup data manually once a month and store on local disks.

DUse a single-node cluster backup to external USB drives weekly.

Attempts:

2 left