MultiIndex lets you organize data with multiple levels. Selecting data with MultiIndex helps you find specific rows or groups easily.
0
0
Selecting data with MultiIndex in Pandas
Introduction
You have sales data by year and month and want to get data for a specific year.
You track students by class and subject and want to see scores for one class.
You store weather data by city and date and want to find data for one city on a certain day.
You analyze product data by category and subcategory and want to select one subcategory.
Syntax
Pandas
df.loc[(level1_value, level2_value, ...)]
Use tuples inside loc to select data by multiple index levels.
You can select one or more levels by providing partial tuples.
Examples
Selects data where the first index level is '2023' and the second is 'January'.
Pandas
df.loc[('2023', 'January')]
Selects all data for the first index level '2023', regardless of the second level.
Pandas
df.loc['2023']Selects all rows where the first index level is 'ClassA'.
Pandas
df.loc[('ClassA',)]Selects data for 'City1' on '2023-06-01'.
Pandas
df.loc[('City1', '2023-06-01')]
Sample Program
This code creates a DataFrame with two index levels: Year and Month. It then selects all sales data for 2023 and sales data for January 2024 using loc with MultiIndex.
Pandas
import pandas as pd # Create a MultiIndex DataFrame index = pd.MultiIndex.from_tuples( [('2023', 'January'), ('2023', 'February'), ('2024', 'January'), ('2024', 'February')], names=['Year', 'Month'] ) data = {'Sales': [100, 150, 200, 250]} df = pd.DataFrame(data, index=index) # Select data for year 2023 sales_2023 = df.loc['2023'] # Select data for year 2024 and month January sales_2024_jan = df.loc[('2024', 'January')] print("Sales in 2023:") print(sales_2023) print("\nSales in January 2024:") print(sales_2024_jan)
OutputSuccess
Important Notes
If you select a single row with all index levels, the result is a Series.
Partial selection returns a DataFrame with all matching rows.
MultiIndex selection is very useful for grouped or hierarchical data.
Summary
Use df.loc with tuples to select data by MultiIndex levels.
Partial tuples select all data matching the given levels.
Full tuples select a single row as a Series.