Challenge - 5 Problems

🎖️

Compression Codec Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Which compression codec is best suited for fast decompression in Hadoop?

In Hadoop, you want to quickly read compressed data with minimal CPU overhead. Which compression codec is generally best for fast decompression?

ASnappy, because it is optimized for speed over compression ratio.

BGzip, because it has the highest compression ratio.

CLZO, because it uses the least memory during decompression.

DNo compression, to avoid decompression overhead.

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

What is the output size difference when compressing a 100MB text file with Gzip vs Snappy?

You compress the same 100MB text file using Gzip and Snappy codecs in Hadoop. Which statement correctly describes the expected compressed file sizes?

ABoth files will be about the same size because they use similar algorithms.

BGzip file will be smaller than Snappy file due to better compression ratio.

CSnappy file will be smaller than Gzip file because it compresses better.

DSnappy file will be larger than the original 100MB file.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does Hadoop fail to read LZO compressed files by default?

You try to read LZO compressed files in Hadoop but get errors. What is the most likely cause?

ALZO codec is not installed or native libraries are missing on the Hadoop nodes.

BLZO files are corrupted and cannot be decompressed.

CHadoop does not support LZO compression at all.

DThe input files are not compressed with LZO but with Gzip.

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Choosing the best codec for a Hadoop job with large binary files needing moderate compression and fast read times

You have large binary files to process in Hadoop. You want moderate compression to save space but also fast read times for analysis. Which codec should you choose?

AGzip, for maximum compression despite slower reads.

BNo compression, to avoid any decompression overhead.

CLZO, for a balance of compression and speed but requires setup.

DSnappy, for fast reads with moderate compression.

Attempts:

2 left

❓ Predict Output

expert

2:00remaining

What is the output of this Hadoop compression codec configuration snippet?

Given the following Hadoop configuration snippet, what compression codec will be used for output files?

Hadoop

conf.set("mapreduce.output.fileoutputformat.compress", "true")
conf.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec")
conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK")

AOutput files will be compressed using LZO codec with record compression.

BOutput files will be compressed using Gzip codec with block compression.

COutput files will not be compressed because compression type is invalid.

DOutput files will be compressed using Snappy codec with block compression.

Attempts:

2 left