Renaming columns in Data Analysis Python - Time & Space Complexity
We want to see how the time it takes to rename columns changes as the number of columns grows.
How does the work increase when we rename more columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({f'col{i}': range(10) for i in range(1000)})
new_names = {f'col{i}': f'new_col{i}' for i in range(1000)}
df.rename(columns=new_names, inplace=True)
This code creates a DataFrame with 1000 columns and renames all columns using a dictionary mapping.
- Primary operation: Renaming each column by looking it up in the dictionary and updating its name.
- How many times: Once for each column, so 1000 times in this example.
As the number of columns increases, the time to rename grows roughly in the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 lookups and renames |
| 100 | About 100 lookups and renames |
| 1000 | About 1000 lookups and renames |
Pattern observation: The work grows directly with the number of columns; doubling columns roughly doubles the work.
Time Complexity: O(n)
This means the time to rename columns grows linearly with the number of columns.
[X] Wrong: "Renaming columns is instant no matter how many columns there are."
[OK] Correct: Each column must be checked and renamed, so more columns mean more work and more time.
Understanding how operations scale with data size helps you write efficient data code and explain your choices clearly.
"What if we rename only a few columns instead of all? How would the time complexity change?"