0
0
Pandasdata~5 mins

Setting a column as index in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Setting a column as index
O(n)
Understanding Time Complexity

When we set a column as the index in pandas, we change how data is organized internally.

We want to know how the time it takes grows as the data gets bigger.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

n = 10  # Example value for n

df = pd.DataFrame({
    'A': range(n),
    'B': range(n, 0, -1),
    'C': ['x'] * n
})

# Set column 'A' as index
indexed_df = df.set_index('A')

This code creates a DataFrame with n rows and then sets column 'A' as the index.

Identify Repeating Operations
  • Primary operation: pandas goes through each row to assign the new index values.
  • How many times: Once for each of the n rows in the DataFrame.
How Execution Grows With Input

As the number of rows increases, the work to set the index grows in a straight line.

Input Size (n)Approx. Operations
10About 10 operations
100About 100 operations
1000About 1000 operations

Pattern observation: Doubling the rows roughly doubles the work needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to set the index grows directly with the number of rows.

Common Mistake

[X] Wrong: "Setting a column as index is instant no matter how big the data is."

[OK] Correct: pandas must look at each row to assign the index, so bigger data takes more time.

Interview Connect

Understanding how data operations scale helps you explain your code choices clearly and confidently.

Self-Check

"What if we set multiple columns as a multi-index? How would the time complexity change?"