Exporting results to multiple formats in Pandas - Time & Space Complexity
When we save data from pandas to files, the time it takes depends on how much data we have and the format we choose.
We want to know how the time to export grows as the data gets bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 1000 # Define n before using it
df = pd.DataFrame({
'A': range(n),
'B': range(n, 2*n)
})
# Export to CSV
csv_path = 'output.csv'
df.to_csv(csv_path, index=False)
# Export to Excel
excel_path = 'output.xlsx'
df.to_excel(excel_path, index=False)
This code creates a DataFrame with n rows and exports it to CSV and Excel files.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Writing each row of the DataFrame to the file format.
- How many times: Once per row, so n times for n rows.
As the number of rows increases, the time to write grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row writes |
| 100 | About 100 row writes |
| 1000 | About 1000 row writes |
Pattern observation: Doubling the rows roughly doubles the work needed to export.
Time Complexity: O(n)
This means the time to export grows linearly with the number of rows in the DataFrame.
[X] Wrong: "Exporting to different formats takes the same time regardless of data size."
[OK] Correct: The time depends on how many rows you have because each row must be processed and written, so bigger data means more time.
Understanding how exporting scales helps you explain performance in real projects where data size changes often.
"What if we export only a subset of columns instead of all? How would the time complexity change?"