loc vs iloc mental model in Pandas - Performance Comparison
We want to understand how the time it takes to select data with loc and iloc changes as the data grows.
How does the size of the data affect the speed of these selection methods?
Analyze the time complexity of selecting rows and columns using loc and iloc.
import pandas as pd
df = pd.DataFrame({
'A': range(1000),
'B': range(1000, 2000),
'C': range(2000, 3000)
})
# Using loc to select rows by label
subset_loc = df.loc[100:199, ['A', 'B']]
# Using iloc to select rows by position
subset_iloc = df.iloc[100:200, 0:2]
This code selects 100 rows and 2 columns from a DataFrame of 1000 rows and 3 columns using both label-based and position-based indexing.
Look at what repeats when selecting data.
- Primary operation: Accessing each requested row and column in the DataFrame.
- How many times: For 100 rows and 2 columns, 200 data points are accessed.
As the number of rows or columns you select grows, the work grows roughly in the same way.
| Input Size (rows x columns) | Approx. Operations |
|---|---|
| 10 x 2 | 20 |
| 100 x 2 | 200 |
| 1000 x 2 | 2000 |
Pattern observation: The number of operations grows proportionally with the number of selected rows and columns.
Time Complexity: O(k)
This means the time to select data grows linearly with the size of the selection, not the whole DataFrame.
[X] Wrong: "Selecting data with loc or iloc always scans the entire DataFrame, so it takes the same time no matter what."
[OK] Correct: Both methods access only the requested rows and columns, so the time depends on the size of the selection, not the full data.
Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects and interviews.
What if we changed the selection to include all rows but only one column? How would the time complexity change?