COPY command for bulk data loading in PostgreSQL - Time & Space Complexity
When loading large amounts of data into a database, it is important to understand how the time taken grows as the data size increases.
We want to know how the COPY command's execution time changes when we add more rows to load.
Analyze the time complexity of this PostgreSQL COPY command.
COPY my_table FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
This command loads data from a CSV file into a table all at once.
Look at what repeats during the COPY operation.
- Primary operation: Reading and inserting each row from the file into the table.
- How many times: Once for every row in the input file.
As the number of rows grows, the time to load grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row insert operations |
| 100 | About 100 row insert operations |
| 1000 | About 1000 row insert operations |
Pattern observation: Doubling the rows roughly doubles the work and time.
Time Complexity: O(n)
This means the time to load data grows linearly with the number of rows being copied.
[X] Wrong: "COPY runs instantly no matter how much data there is."
[OK] Correct: Even though COPY is fast, it still processes each row one by one, so more rows mean more time.
Understanding how bulk loading scales helps you explain data import performance and troubleshoot delays in real projects.
What if we used COPY with a binary format instead of CSV? How would the time complexity change?