Working with CSV files in Python - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When working with CSV files, it's important to know how the time to process data grows as the file gets bigger.
We want to understand how reading and handling each row affects the total time.
Analyze the time complexity of the following code snippet.
import csv
def read_csv(filename):
with open(filename, newline='') as csvfile:
reader = csv.reader(csvfile)
data = []
for row in reader:
data.append(row)
return data
This code reads all rows from a CSV file and stores them in a list.
- Primary operation: Looping through each row in the CSV file.
- How many times: Once for every row in the file (n times).
As the number of rows increases, the time to read and store them grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row reads and appends |
| 100 | About 100 row reads and appends |
| 1000 | About 1000 row reads and appends |
Pattern observation: The work grows evenly with the number of rows; doubling rows doubles work.
Time Complexity: O(n)
This means the time to read the CSV grows directly with the number of rows.
[X] Wrong: "Reading a CSV file always takes the same time no matter how big it is."
[OK] Correct: The more rows there are, the more times the loop runs, so it takes longer.
Understanding how file reading time grows helps you write efficient data processing code and explain your reasoning clearly.
"What if we processed each row twice inside the loop? How would the time complexity change?"
Practice
csv.reader function do when working with CSV files?Solution
Step 1: Understand the purpose of
Thecsv.readercsv.readerfunction reads CSV files and returns each row as a list of strings representing the columns.Step 2: Differentiate from other CSV functions
Functions like writing or deleting files are not done bycsv.reader. It only reads and parses rows.Final Answer:
Reads the CSV file and returns each row as a list of values -> Option AQuick Check:
csv.readerreads rows as lists [OK]
- Confusing reader with writer
- Thinking it deletes files
- Assuming it converts formats
Solution
Step 1: Understand file modes in Python
The mode 'r' means open for reading, which is needed to read a CSV file.Step 2: Check other modes
'w' is for writing (overwrites), 'a' is for appending, and 'x' is for creating a new file. None are for reading existing files.Final Answer:
open('data.csv', 'r') -> Option CQuick Check:
Use 'r' mode to read files [OK]
- Using 'w' which overwrites file
- Using 'a' which appends data
- Using 'x' which fails if file exists
import csv
with open('data.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Name', 'Age'])
writer.writerow(['Alice', '30'])
with open('data.csv', 'r') as f:
reader = csv.reader(f)
rows = list(reader)
print(rows)Solution
Step 1: Writing rows with csv.writer
The code writes two rows: header ['Name', 'Age'] and data ['Alice', '30'] as lists.Step 2: Reading rows with csv.reader
Reading back returns a list of lists, each inner list is a row split by commas.Final Answer:
[['Name', 'Age'], ['Alice', '30']] -> Option DQuick Check:
csv.reader returns list of lists [OK]
- Expecting a flat list instead of list of lists
- Thinking rows are single strings
- Syntax errors from missing newline='' in open
import csv
with open('data.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row)Solution
Step 1: Check indentation inside the for loop
The print statement must be indented inside the for loop to run for each row.Step 2: Verify other parts
Import is present, file mode 'r' is correct for reading, and csv.reader works with 'with' statement.Final Answer:
Indentation error in the for loop body -> Option BQuick Check:
Indent loop body correctly [OK]
- Not indenting loop body
- Changing file mode incorrectly
- Thinking csv.reader can't be used with 'with'
Solution
Step 1: Use csv.DictReader to access columns by name
DictReader reads rows as dictionaries, so we can use keys like 'Name' and 'Age'.Step 2: Create dictionary with names as keys and ages as integer values
The comprehension usesrow['Name']as key and convertsrow['Age']to int for value.Final Answer:
import csv with open('data.csv', 'r') as f: reader = csv.DictReader(f) result = {row['Name']: int(row['Age']) for row in reader} print(result) -> Option AQuick Check:
DictReader + dict comprehension with int conversion [OK]
- Using csv.reader without column names
- Swapping keys and values
- Not converting age to int
