Selecting multiple columns in Pandas - Time & Space Complexity
We want to understand how the time it takes to select multiple columns from a table grows as the table gets bigger.
Specifically, how does the work change when the number of rows or columns changes?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Example value for n
data = pd.DataFrame({
'A': range(n),
'B': range(n),
'C': range(n),
'D': range(n)
})
selected = data[['A', 'C']]
This code creates a table with n rows and 4 columns, then selects two columns from it.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Copying data from the selected columns for all rows.
- How many times: Once for each row in the table (n times).
As the number of rows grows, the work to select columns grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations (copying 2 columns for 10 rows) |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to select columns grows in a straight line as the number of rows increases.
[X] Wrong: "Selecting columns is instant and does not depend on the number of rows."
[OK] Correct: Even though we only pick columns, pandas must copy data for every row, so more rows mean more work.
Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects.
"What if we selected all columns instead of just two? How would the time complexity change?"