iloc for position-based selection in Pandas - Time & Space Complexity
We want to understand how the time it takes to select data using iloc changes as the data size grows.
How does the number of rows or columns affect the work done when using iloc?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'A': range(1000),
'B': range(1000, 2000),
'C': range(2000, 3000)
})
subset = df.iloc[100:200, 1:3]
This code creates a DataFrame with 1000 rows and 3 columns, then selects rows 100 to 199 and columns 1 to 2 using iloc.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Accessing each cell in the selected slice of the DataFrame.
- How many times: Once for each row and column in the selected range.
When you select more rows or columns, the work grows with the size of the slice.
| Input Size (rows x columns) | Approx. Operations |
|---|---|
| 10 x 2 | 20 |
| 100 x 2 | 200 |
| 100 x 3 | 300 |
Pattern observation: The operations grow roughly in direct proportion to the number of cells selected.
Time Complexity: O(r x c)
This means the time grows proportionally to the number of rows (r) times the number of columns (c) you select.
[X] Wrong: "Selecting data with iloc always takes the same time no matter how much data is selected."
[OK] Correct: The time depends on how many rows and columns you pick because iloc accesses each cell in the selection.
Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects and interviews.
What if we changed the selection to only one column but many rows? How would the time complexity change?