Setting a column as index in Pandas - Time & Space Complexity
When we set a column as the index in pandas, we change how data is organized internally.
We want to know how the time it takes grows as the data gets bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Example value for n
df = pd.DataFrame({
'A': range(n),
'B': range(n, 0, -1),
'C': ['x'] * n
})
# Set column 'A' as index
indexed_df = df.set_index('A')
This code creates a DataFrame with n rows and then sets column 'A' as the index.
- Primary operation: pandas goes through each row to assign the new index values.
- How many times: Once for each of the n rows in the DataFrame.
As the number of rows increases, the work to set the index grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: Doubling the rows roughly doubles the work needed.
Time Complexity: O(n)
This means the time to set the index grows directly with the number of rows.
[X] Wrong: "Setting a column as index is instant no matter how big the data is."
[OK] Correct: pandas must look at each row to assign the index, so bigger data takes more time.
Understanding how data operations scale helps you explain your code choices clearly and confidently.
"What if we set multiple columns as a multi-index? How would the time complexity change?"