sort and uniq in Linux CLI - Time & Space Complexity
When using sort and uniq commands together, it is important to understand how the time taken grows as the input size increases.
We want to know how the execution time changes when the input file gets bigger.
Analyze the time complexity of the following code snippet.
sort input.txt | uniq
This code sorts the lines of a file and then removes duplicate lines that are next to each other.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting all lines in the input file.
- How many times: The sorting process compares and rearranges lines multiple times depending on input size.
- Secondary operation:
uniqscans the sorted lines once to remove duplicates. - How many times: One pass through all lines.
As the number of lines grows, sorting takes more time because it compares many pairs of lines. Removing duplicates is faster since it just checks neighbors once.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 to 50 comparisons for sorting, plus 10 checks for duplicates |
| 100 | About 700 to 1000 comparisons for sorting, plus 100 checks for duplicates |
| 1000 | About 10,000 to 15,000 comparisons for sorting, plus 1000 checks for duplicates |
Pattern observation: Sorting grows faster than linearly, while duplicate removal grows linearly.
Time Complexity: O(n log n)
This means the time to sort and remove duplicates grows a bit faster than just the number of lines, mainly because sorting takes more work as the file gets bigger.
[X] Wrong: "The whole command runs in linear time because uniq just scans once."
[OK] Correct: Sorting is the main cost and it takes more than linear time, so the total time is dominated by sorting, not just scanning.
Understanding how sorting affects performance helps you explain real-world scripts and commands clearly. This skill shows you can think about how tools behave as data grows.
"What if we used sort -u instead of sort | uniq? How would the time complexity change?"