0
0
Hadoopdata~30 mins

Compression codecs (Snappy, LZO, Gzip) in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Using Compression Codecs in Hadoop
📖 Scenario: You work with large data files in Hadoop. To save space and speed up processing, you want to compress files using different codecs: Snappy, LZO, and Gzip.
🎯 Goal: Learn how to set up and use compression codecs Snappy, LZO, and Gzip in Hadoop to compress and decompress data files.
📋 What You'll Learn
Create a sample text file in Hadoop filesystem
Set a variable for the compression codec
Write a Hadoop command to compress the file using the chosen codec
Display the compressed file size
💡 Why This Matters
🌍 Real World
Compression codecs help save storage space and speed up data processing in big data systems like Hadoop.
💼 Career
Knowing how to use compression codecs is important for data engineers and data scientists working with large datasets in Hadoop environments.
Progress0 / 4 steps
1
Create a sample text file in Hadoop filesystem
Use the Hadoop command hdfs dfs -put to upload a local file named sample.txt with the exact content:
Hello Hadoop Compression Codecs
to the Hadoop directory /user/hadoop/input/.
Hadoop
Need a hint?

First create the directory, then upload the file using hdfs dfs -put.

2
Set the compression codec variable
Create a shell variable called codec and set it to the exact value snappy to specify the compression codec.
Hadoop
Need a hint?

Use codec=snappy to set the variable.

3
Compress the file using the chosen codec
Use the hadoop jar command with the variable $codec to compress the file /user/hadoop/input/sample.txt into /user/hadoop/output/sample.txt.$codec (a directory containing the compressed file). Use the appropriate codec option.
Hadoop
Need a hint?

Use -D mapreduce.output.fileoutputformat.compress=true and set the codec class for Snappy.

4
Display the compressed file size
Use the Hadoop command hdfs dfs -du -h to display the size of the compressed file /user/hadoop/output/sample.txt.$codec.
Hadoop
Need a hint?

Use hdfs dfs -du -h to check file size.