Reading CSV with options (sep, header, encoding) in Data Analysis Python - Time & Space Complexity
When we read CSV files with options like separator, header, and encoding, we want to know how the time to read grows as the file gets bigger.
We ask: How does the reading time change when the file has more rows or columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.read_csv('data.csv', sep=';', header=0, encoding='utf-8')
print(df.head())
This code reads a CSV file using a semicolon as separator, treats the first line as header, and uses UTF-8 encoding.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Reading each line of the file and splitting it by the separator.
- How many times: Once for every row in the file (n times).
As the number of rows grows, the reading time grows roughly in the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 line reads and splits |
| 100 | About 100 line reads and splits |
| 1000 | About 1000 line reads and splits |
Pattern observation: The work grows directly with the number of rows; double the rows, double the work.
Time Complexity: O(n)
This means the time to read the CSV grows linearly with the number of rows in the file.
[X] Wrong: "Changing the separator or encoding will make reading much slower in a way that changes the time complexity."
[OK] Correct: These options affect how each line is processed but do not change the fact that each line is read once, so the overall time still grows linearly with the number of rows.
Understanding how file reading scales helps you explain data loading performance clearly and confidently in real projects or interviews.
"What if the CSV file has a very large number of columns instead of rows? How would the time complexity change?"