Recall & Review
beginner
What is the main purpose of compression codecs like Snappy, LZO, and Gzip in Hadoop?
Compression codecs reduce the size of data stored or transferred, saving storage space and speeding up data processing by reducing I/O time.
Click to reveal answer
beginner
Which compression codec among Snappy, LZO, and Gzip is known for the fastest compression and decompression speed?
Snappy is known for very fast compression and decompression speeds, making it suitable for real-time data processing.
Click to reveal answer
intermediate
How does Gzip compression compare to Snappy and LZO in terms of compression ratio and speed?
Gzip offers a higher compression ratio (smaller files) but is slower in compression and decompression compared to Snappy and LZO.
Click to reveal answer
intermediate
What is a key limitation of LZO compression in Hadoop?
LZO requires an index file to enable splitting of compressed files for parallel processing, which adds setup complexity.
Click to reveal answer
beginner
Why might you choose Snappy over Gzip for compressing Hadoop data?
Choose Snappy when you need faster data processing and can accept a lower compression ratio, as it speeds up reading and writing data.
Click to reveal answer
Which compression codec is fastest for decompression in Hadoop?
✗ Incorrect
Snappy is designed for very fast compression and decompression speeds.
Which codec typically produces the smallest compressed files?
✗ Incorrect
Gzip compresses data more tightly, resulting in smaller files but slower speed.
What extra file does LZO require for Hadoop to split compressed files?
✗ Incorrect
LZO needs an index file to allow Hadoop to split files for parallel processing.
If you want to speed up Hadoop data processing with compression, which codec is best?
✗ Incorrect
Snappy offers fast compression and decompression, improving processing speed.
Which codec is slower but compresses data more tightly?
✗ Incorrect
Gzip compresses data more but is slower than Snappy and LZO.
Explain the trade-offs between Snappy, LZO, and Gzip compression codecs in Hadoop.
Think about speed versus file size and Hadoop processing needs.
You got /3 concepts.
Describe why compression codecs are important in Hadoop data processing.
Consider how big data systems handle lots of data.
You got /3 concepts.