0
0
Hadoopdata~20 mins

Compression codecs (Snappy, LZO, Gzip) in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Compression Codec Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Which compression codec is best suited for fast decompression in Hadoop?

In Hadoop, you want to quickly read compressed data with minimal CPU overhead. Which compression codec is generally best for fast decompression?

ASnappy, because it is optimized for speed over compression ratio.
BGzip, because it has the highest compression ratio.
CLZO, because it uses the least memory during decompression.
DNo compression, to avoid decompression overhead.
Attempts:
2 left
💡 Hint

Think about which codec is designed for speed rather than maximum compression.

data_output
intermediate
2:00remaining
What is the output size difference when compressing a 100MB text file with Gzip vs Snappy?

You compress the same 100MB text file using Gzip and Snappy codecs in Hadoop. Which statement correctly describes the expected compressed file sizes?

ABoth files will be about the same size because they use similar algorithms.
BGzip file will be smaller than Snappy file due to better compression ratio.
CSnappy file will be smaller than Gzip file because it compresses better.
DSnappy file will be larger than the original 100MB file.
Attempts:
2 left
💡 Hint

Consider which codec compresses better but is slower.

🔧 Debug
advanced
2:00remaining
Why does Hadoop fail to read LZO compressed files by default?

You try to read LZO compressed files in Hadoop but get errors. What is the most likely cause?

ALZO codec is not installed or native libraries are missing on the Hadoop nodes.
BLZO files are corrupted and cannot be decompressed.
CHadoop does not support LZO compression at all.
DThe input files are not compressed with LZO but with Gzip.
Attempts:
2 left
💡 Hint

Think about Hadoop's support for LZO and what is required to use it.

🚀 Application
advanced
2:00remaining
Choosing the best codec for a Hadoop job with large binary files needing moderate compression and fast read times

You have large binary files to process in Hadoop. You want moderate compression to save space but also fast read times for analysis. Which codec should you choose?

AGzip, for maximum compression despite slower reads.
BNo compression, to avoid any decompression overhead.
CLZO, for a balance of compression and speed but requires setup.
DSnappy, for fast reads with moderate compression.
Attempts:
2 left
💡 Hint

Consider codecs that balance speed and compression but may need extra setup.

Predict Output
expert
2:00remaining
What is the output of this Hadoop compression codec configuration snippet?

Given the following Hadoop configuration snippet, what compression codec will be used for output files?

Hadoop
conf.set("mapreduce.output.fileoutputformat.compress", "true")
conf.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec")
conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK")
AOutput files will be compressed using LZO codec with record compression.
BOutput files will be compressed using Gzip codec with block compression.
COutput files will not be compressed because compression type is invalid.
DOutput files will be compressed using Snappy codec with block compression.
Attempts:
2 left
💡 Hint

Look carefully at the codec class name and compression type.