What is MultiIndex in pandas: Explanation and Example
MultiIndex is a way to have multiple levels of indexing on rows or columns in a DataFrame, allowing you to organize and access data with more complexity. It works like a hierarchy of labels, making it easier to work with grouped or nested data.How It Works
Think of a MultiIndex as a set of labels stacked on top of each other to identify rows or columns. Instead of just one label per row, you have multiple labels that form a hierarchy, like a family tree or a filing system with folders and subfolders.
This helps pandas organize data that naturally groups into categories and subcategories. For example, sales data might be grouped by year and then by month. The MultiIndex lets you easily select, filter, or summarize data at any level of this hierarchy.
Example
This example shows how to create a DataFrame with a MultiIndex on rows using two levels: 'City' and 'Year'. It then demonstrates how the data looks and how you can access it.
import pandas as pd # Create tuples for multi-level index index = pd.MultiIndex.from_tuples([ ('New York', 2020), ('New York', 2021), ('Los Angeles', 2020), ('Los Angeles', 2021) ], names=['City', 'Year']) # Create DataFrame with MultiIndex data = pd.DataFrame({'Sales': [250, 270, 190, 210]}, index=index) print(data) # Access data for New York print(data.loc['New York'])
When to Use
Use MultiIndex when your data naturally fits into multiple categories or levels. It is helpful for:
- Time series data grouped by year, month, day
- Geographical data grouped by country, state, city
- Sales or survey data grouped by product category and subcategory
- Any dataset where you want to perform grouped analysis or pivot tables
This structure makes it easier to slice and dice data without losing the context of the groups.
Key Points
- MultiIndex allows multiple levels of row or column labels.
- It helps organize complex, hierarchical data clearly.
- You can select data at any level of the index easily.
- It is useful for grouped data analysis and pivot operations.