0
0
Pandasdata~5 mins

loc vs iloc mental model in Pandas - Performance Comparison

Choose your learning style9 modes available
Time Complexity: loc vs iloc mental model
O(k)
Understanding Time Complexity

We want to understand how the time it takes to select data with loc and iloc changes as the data grows.

How does the size of the data affect the speed of these selection methods?

Scenario Under Consideration

Analyze the time complexity of selecting rows and columns using loc and iloc.

import pandas as pd

df = pd.DataFrame({
    'A': range(1000),
    'B': range(1000, 2000),
    'C': range(2000, 3000)
})

# Using loc to select rows by label
subset_loc = df.loc[100:199, ['A', 'B']]

# Using iloc to select rows by position
subset_iloc = df.iloc[100:200, 0:2]

This code selects 100 rows and 2 columns from a DataFrame of 1000 rows and 3 columns using both label-based and position-based indexing.

Identify Repeating Operations

Look at what repeats when selecting data.

  • Primary operation: Accessing each requested row and column in the DataFrame.
  • How many times: For 100 rows and 2 columns, 200 data points are accessed.
How Execution Grows With Input

As the number of rows or columns you select grows, the work grows roughly in the same way.

Input Size (rows x columns)Approx. Operations
10 x 220
100 x 2200
1000 x 22000

Pattern observation: The number of operations grows proportionally with the number of selected rows and columns.

Final Time Complexity

Time Complexity: O(k)

This means the time to select data grows linearly with the size of the selection, not the whole DataFrame.

Common Mistake

[X] Wrong: "Selecting data with loc or iloc always scans the entire DataFrame, so it takes the same time no matter what."

[OK] Correct: Both methods access only the requested rows and columns, so the time depends on the size of the selection, not the full data.

Interview Connect

Understanding how data selection scales helps you write efficient code and explain your choices clearly in real projects and interviews.

Self-Check

What if we changed the selection to include all rows but only one column? How would the time complexity change?