tar with compression (-z, -j, -J) in Linux CLI - Time & Space Complexity
When using tar with compression options, it is important to understand how the time to create or extract archives grows as the size of files increases.
We want to know how the execution time changes when compressing or decompressing larger amounts of data.
Analyze the time complexity of the following tar command with compression.
tar -czf archive.tar.gz folder/
# or
# tar -cjf archive.tar.bz2 folder/
# or
# tar -cJf archive.tar.xz folder/
This command creates a compressed archive of the folder using gzip (-z), bzip2 (-j), or xz (-J) compression.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Reading each file's data and compressing it.
- How many times: Once for each byte of input data in the folder.
As the total size of files in the folder grows, the time to read and compress all data grows roughly in proportion.
| Input Size (n in MB) | Approx. Operations |
|---|---|
| 10 | Processes about 10 MB of data |
| 100 | Processes about 100 MB of data |
| 1000 | Processes about 1000 MB (1 GB) of data |
Pattern observation: Doubling the input size roughly doubles the time needed to compress.
Time Complexity: O(n)
This means the time to compress grows linearly with the total size of the input data.
[X] Wrong: "Compression time depends only on the number of files, not their size."
[OK] Correct: Compression works on the actual data size, so larger files take more time even if the file count is small.
Understanding how compression time scales helps you reason about script performance and system resource use in real tasks.
What if we used tar without compression? How would the time complexity change?