cut and paste in Bash Scripting - Time & Space Complexity
When using cut and paste commands in bash, it's important to know how the time to run the commands changes as the input size grows.
We want to understand how the script's speed changes when working with bigger files or more lines.
Analyze the time complexity of the following code snippet.
#!/bin/bash
input_file="data.txt"
output_file="result.txt"
# Cut first column and paste it to output file
cut -d',' -f1 "$input_file" > "$output_file"
# Paste second column to the output file
cut -d',' -f2 "$input_file" | paste "$output_file" - > temp && mv temp "$output_file"
This script extracts the first and second columns from a CSV file and combines them side by side into a new file.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Reading each line of the input file twice (once per cut command).
- How many times: Each line is processed twice, once for each column extraction.
As the number of lines in the input file grows, the script reads through all lines twice.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 line reads |
| 100 | About 200 line reads |
| 1000 | About 2000 line reads |
Pattern observation: The total work grows roughly twice as fast as the number of lines because each line is read two times.
Time Complexity: O(n)
This means the time to run the script grows linearly with the number of lines in the input file.
[X] Wrong: "Since there are two cut commands, the time complexity is squared, like O(n²)."
[OK] Correct: Each cut command reads the file once, so the total work is two times n, which is still linear, not squared.
Understanding how simple commands like cut and paste scale with input size helps you write efficient scripts and explain your reasoning clearly in real-world tasks.
What if we combined the two cut commands into one that extracts both columns at once? How would the time complexity change?