0
0
Pandasdata~5 mins

Why MultiIndex enables hierarchical data in Pandas

Choose your learning style9 modes available
Introduction

MultiIndex lets you organize data in layers, like folders inside folders. This helps you work with complex data that has multiple levels of categories.

You have sales data by country, then by city, and want to analyze both levels together.
You want to group data by year and month to see trends over time.
You have survey results categorized by age group and gender and want to compare them easily.
You need to store and access data with multiple keys, like product category and product type.
You want to perform detailed slicing and filtering on data with several layers of labels.
Syntax
Pandas
pandas.MultiIndex.from_tuples(tuples)
pandas.MultiIndex.from_product([list1, list2])
DataFrame.set_index([col1, col2])

You can create MultiIndex from tuples or from the product of lists.

Setting multiple columns as index creates a MultiIndex automatically.

Examples
This creates a MultiIndex with two levels: country and city.
Pandas
import pandas as pd

# Create MultiIndex from tuples
index = pd.MultiIndex.from_tuples([('USA', 'NY'), ('USA', 'LA'), ('Canada', 'Toronto')])
print(index)
This creates all combinations of the two lists as a MultiIndex.
Pandas
import pandas as pd

# Create MultiIndex from product of lists
index = pd.MultiIndex.from_product([['A', 'B'], [1, 2]])
print(index)
Setting two columns as index creates a MultiIndex DataFrame.
Pandas
import pandas as pd

data = {'Country': ['USA', 'USA', 'Canada'], 'City': ['NY', 'LA', 'Toronto'], 'Sales': [100, 200, 150]}
df = pd.DataFrame(data)
df = df.set_index(['Country', 'City'])
print(df)
Sample Program

This example shows how MultiIndex organizes data by department and employee. You can easily access specific entries using both levels.

Pandas
import pandas as pd

# Sample data with two levels: Department and Employee
data = {
    'Department': ['HR', 'HR', 'IT', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David'],
    'Salary': [50000, 52000, 60000, 62000]
}
df = pd.DataFrame(data)

# Set MultiIndex using Department and Employee
multi_df = df.set_index(['Department', 'Employee'])

# Show the MultiIndex DataFrame
print(multi_df)

# Access salary of Bob in HR
print('\nSalary of Bob in HR:', multi_df.loc[('HR', 'Bob'), 'Salary'])
OutputSuccess
Important Notes

MultiIndex helps keep related data grouped and easy to access.

It can make data analysis clearer when working with multiple categories.

Remember to use tuples to access data with MultiIndex.

Summary

MultiIndex creates layers of labels for complex data.

It helps organize and access data with multiple categories easily.

Use MultiIndex to analyze hierarchical data clearly and efficiently.