0
0
Hadoopdata~10 mins

Data lake design patterns in Hadoop - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a raw zone directory in HDFS for the data lake.

Hadoop
hdfs dfs -mkdir /data_lake/[1]
Drag options to blanks, or click blank then click option'
Aanalytics
Braw_zone
Cprocessed_data
Dbackup
Attempts:
3 left
💡 Hint
Common Mistakes
Using processed_data instead of raw_zone
Creating backup folder instead of raw zone
2fill in blank
medium

Complete the code to move data from the raw zone to the curated zone in HDFS.

Hadoop
hdfs dfs -mv /data_lake/raw_zone/[1] /data_lake/curated_zone/
Drag options to blanks, or click blank then click option'
Atemp_data.csv
Bbackup_data.csv
Cold_data.csv
Dlog_data.csv
Attempts:
3 left
💡 Hint
Common Mistakes
Moving backup_data.csv instead of temp_data.csv
Using wrong source or destination paths
3fill in blank
hard

Fix the error in the Spark code to read data from the curated zone in the data lake.

Hadoop
df = spark.read.format('parquet').load('/data_lake/[1]/2024/06/01')
Drag options to blanks, or click blank then click option'
Acurated_zone
Btemp_zone
Carchive_zone
Draw_zone
Attempts:
3 left
💡 Hint
Common Mistakes
Reading from raw_zone instead of curated_zone
Using archive_zone which is for old data
4fill in blank
hard

Fill both blanks to create a partitioned table in Hive for the data lake's curated data.

Hadoop
CREATE EXTERNAL TABLE curated_data (id INT, name STRING) PARTITIONED BY ([1] STRING, [2] STRING) STORED AS PARQUET LOCATION '/data_lake/curated_zone/';
Drag options to blanks, or click blank then click option'
Ayear
Bmonth
Cday
Dhour
Attempts:
3 left
💡 Hint
Common Mistakes
Using day or hour as first partition instead of year
Not partitioning by time fields
5fill in blank
hard

Fill all three blanks to create a dictionary comprehension that maps file names to their sizes for files larger than 100MB in the raw zone.

Hadoop
file_sizes = { [1]: [2] for [1] in files if files[[1]] > 100 }
Drag options to blanks, or click blank then click option'
Afile
Bfiles[file]
Csize
Dfiles
Attempts:
3 left
💡 Hint
Common Mistakes
Using size as loop variable which is undefined
Using files instead of files[file] to get size