Writing to Excel with to_excel in Pandas - Time & Space Complexity
When saving data to an Excel file using pandas, it is important to understand how the time taken grows as the data size increases.
We want to know how the writing process scales when the number of rows or columns grows.
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'A': range(1000),
'B': range(1000, 2000)
})
df.to_excel('output.xlsx', index=False)
This code creates a DataFrame with 1000 rows and 2 columns, then writes it to an Excel file without the index.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Writing each cell's data to the Excel file.
- How many times: Once for each cell in the DataFrame (rows x columns).
As the number of rows or columns increases, the number of cells to write grows proportionally.
| Input Size (rows x columns) | Approx. Operations |
|---|---|
| 10 x 2 = 20 | About 20 write operations |
| 100 x 2 = 200 | About 200 write operations |
| 1000 x 2 = 2000 | About 2000 write operations |
Pattern observation: The time grows roughly in direct proportion to the total number of cells.
Time Complexity: O(n × m)
This means the time to write grows linearly with the number of rows (n) times the number of columns (m).
[X] Wrong: "Writing to Excel takes the same time no matter how big the DataFrame is."
[OK] Correct: Writing involves saving every cell's data, so more rows or columns mean more work and more time.
Understanding how data export time grows helps you write efficient data pipelines and manage large datasets smoothly.
What if we added compression or saved to a different file format? How would the time complexity change?