Selecting rows (loc, iloc) in Data Analysis Python - Time & Space Complexity
When we select rows from a table, we want to know how long it takes as the table grows.
How does the time to pick rows change when the data gets bigger?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'A': range(1000),
'B': range(1000, 2000)
})
selected_rows = data.loc[100:199]
This code selects rows from index 100 to 199 using loc.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Accessing each row in the selected range.
- How many times: Once for each row in the slice (here, 100 times).
When you select more rows, the time grows roughly with how many rows you pick.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row accesses |
| 100 | About 100 row accesses |
| 1000 | About 1000 row accesses |
Pattern observation: The time grows linearly with the number of rows selected.
Time Complexity: O(k)
This means the time grows directly with the number of rows you select, not the total table size.
[X] Wrong: "Selecting rows always takes time proportional to the whole table size."
[OK] Correct: Actually, selecting a slice accesses only the rows you want, so time depends on how many rows you pick, not the entire table.
Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects.
What if we select rows using a condition instead of a slice? How would the time complexity change?