0
0
Pandasdata~5 mins

Selecting columns by name in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Selecting columns by name
O(n)
Understanding Time Complexity

When we select columns by their names in pandas, we want to know how the time it takes changes as the data grows.

We ask: How does the work increase when the number of rows or columns grows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'A': range(1000),
    'B': range(1000, 2000),
    'C': range(2000, 3000)
})

selected = df[['A', 'C']]

This code creates a DataFrame with 3 columns and selects two columns by their names.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Accessing and copying the selected columns from the DataFrame.
  • How many times: Once per row for each selected column.
How Execution Grows With Input

As the number of rows grows, the work to copy selected columns grows proportionally.

Input Size (n rows)Approx. Operations
1020 (2 columns x 10 rows)
100200 (2 columns x 100 rows)
10002000 (2 columns x 1000 rows)

Pattern observation: The operations grow linearly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to select columns grows directly with the number of rows in the DataFrame.

Common Mistake

[X] Wrong: "Selecting columns by name is instant and does not depend on data size."

[OK] Correct: Even though column names are used, pandas must copy data for each row in those columns, so time grows with rows.

Interview Connect

Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects.

Self-Check

"What if we select all columns instead of just a few? How would the time complexity change?"