Working with CSV files in Python - Time & Space Complexity
When working with CSV files, it's important to know how the time to process data grows as the file gets bigger.
We want to understand how reading and handling each row affects the total time.
Analyze the time complexity of the following code snippet.
import csv
def read_csv(filename):
with open(filename, newline='') as csvfile:
reader = csv.reader(csvfile)
data = []
for row in reader:
data.append(row)
return data
This code reads all rows from a CSV file and stores them in a list.
- Primary operation: Looping through each row in the CSV file.
- How many times: Once for every row in the file (n times).
As the number of rows increases, the time to read and store them grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row reads and appends |
| 100 | About 100 row reads and appends |
| 1000 | About 1000 row reads and appends |
Pattern observation: The work grows evenly with the number of rows; doubling rows doubles work.
Time Complexity: O(n)
This means the time to read the CSV grows directly with the number of rows.
[X] Wrong: "Reading a CSV file always takes the same time no matter how big it is."
[OK] Correct: The more rows there are, the more times the loop runs, so it takes longer.
Understanding how file reading time grows helps you write efficient data processing code and explain your reasoning clearly.
"What if we processed each row twice inside the loop? How would the time complexity change?"