0
0
Pandasdata~5 mins

Selecting multiple columns in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Selecting multiple columns
O(n)
Understanding Time Complexity

We want to understand how the time it takes to select multiple columns from a table grows as the table gets bigger.

Specifically, how does the work change when the number of rows or columns changes?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import pandas as pd

n = 10  # Example value for n

data = pd.DataFrame({
    'A': range(n),
    'B': range(n),
    'C': range(n),
    'D': range(n)
})

selected = data[['A', 'C']]
    

This code creates a table with n rows and 4 columns, then selects two columns from it.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Copying data from the selected columns for all rows.
  • How many times: Once for each row in the table (n times).
How Execution Grows With Input

As the number of rows grows, the work to select columns grows proportionally.

Input Size (n)Approx. Operations
10About 10 operations (copying 2 columns for 10 rows)
100About 100 operations
1000About 1000 operations

Pattern observation: The work grows directly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to select columns grows in a straight line as the number of rows increases.

Common Mistake

[X] Wrong: "Selecting columns is instant and does not depend on the number of rows."

[OK] Correct: Even though we only pick columns, pandas must copy data for every row, so more rows mean more work.

Interview Connect

Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects.

Self-Check

"What if we selected all columns instead of just two? How would the time complexity change?"