MultiIndex (hierarchical indexing) in Data Analysis Python - Time & Space Complexity
We want to understand how the time to work with MultiIndex in pandas grows as the data size increases.
Specifically, how does accessing or slicing data with hierarchical indexes affect speed?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # example value
m = 5 # example value
# Create MultiIndex from two lists
index = pd.MultiIndex.from_product([range(n), range(m)], names=['level_1', 'level_2'])
# Create DataFrame with MultiIndex
df = pd.DataFrame({'value': range(n*m)}, index=index)
# Access all rows where level_1 == i
i = 0 # example value
result = df.loc[i]
This code creates a MultiIndex DataFrame and accesses all rows for a fixed first-level index value.
- Primary operation: Accessing rows by first-level index in MultiIndex.
- How many times: The operation scans or jumps through the subset of rows matching the first-level key, which is about m rows.
When we fix the first-level index and get all matching rows, the time depends on how many rows share that first-level key.
| Input Size (n x m) | Approx. Operations |
|---|---|
| 10 x 5 | 5 (rows for one first-level key) |
| 100 x 50 | 50 |
| 1000 x 100 | 100 |
Pattern observation: The time grows linearly with the size of the second-level index (m), not the total size (n*m).
Time Complexity: O(m)
This means accessing all rows for a fixed first-level index takes time proportional to the number of second-level entries under it.
[X] Wrong: "Accessing data by first-level index in MultiIndex is always O(n*m) because the whole DataFrame is scanned."
[OK] Correct: pandas uses efficient indexing to jump directly to the matching rows, so it only processes the subset for that first-level key, not the entire DataFrame.
Understanding how hierarchical indexing affects data access speed helps you explain efficient data querying in pandas, a key skill for data analysis roles.
"What if we accessed data by both levels of the MultiIndex at once? How would the time complexity change?"