Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to create a raw zone directory in HDFS for the data lake.

Hadoop

hdfs dfs -mkdir /data_lake/[1]

Drag options to blanks, or click blank then click option'

Aanalytics

Braw_zone

Cprocessed_data

Dbackup

Attempts:

3 left

2fill in blank

medium

Complete the code to move data from the raw zone to the curated zone in HDFS.

Hadoop

hdfs dfs -mv /data_lake/raw_zone/[1] /data_lake/curated_zone/

Drag options to blanks, or click blank then click option'

Atemp_data.csv

Bbackup_data.csv

Cold_data.csv

Dlog_data.csv

Attempts:

3 left

3fill in blank

hard

Fix the error in the Spark code to read data from the curated zone in the data lake.

Hadoop

df = spark.read.format('parquet').load('/data_lake/[1]/2024/06/01')

Drag options to blanks, or click blank then click option'

Acurated_zone

Btemp_zone

Carchive_zone

Draw_zone

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a partitioned table in Hive for the data lake's curated data.

Hadoop

CREATE EXTERNAL TABLE curated_data (id INT, name STRING) PARTITIONED BY ([1] STRING, [2] STRING) STORED AS PARQUET LOCATION '/data_lake/curated_zone/';

Drag options to blanks, or click blank then click option'

Ayear

Bmonth

Cday

Dhour

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to create a dictionary comprehension that maps file names to their sizes for files larger than 100MB in the raw zone.

Hadoop

file_sizes = { [1]: [2] for [1] in files if files[[1]] > 100 }

Drag options to blanks, or click blank then click option'

Afile

Bfiles[file]

Csize

Dfiles

Attempts:

3 left