0
0
PandasHow-ToBeginner · 3 min read

How to Flatten DataFrame Columns in pandas Easily

To flatten dataframe columns in pandas, especially when columns have multiple levels (MultiIndex), use df.columns.map with a join operation or df.columns.to_flat_index() to convert them into single-level columns. This simplifies column names into strings for easier access and analysis.
📐

Syntax

When you have a pandas DataFrame with multi-level columns, you can flatten them using these common patterns:

  • df.columns = ['_'.join(col).strip() for col in df.columns.values]: Joins each tuple of column levels into a single string separated by underscores.
  • df.columns = df.columns.to_flat_index(): Converts MultiIndex columns to a flat Index of tuples.

These methods help convert complex column headers into simpler, one-level column names.

python
df.columns = ['_'.join(col).strip() for col in df.columns.values]
💻

Example

This example shows how to flatten a DataFrame with multi-level columns into single-level columns by joining the levels with underscores.

python
import pandas as pd

# Create a sample DataFrame with multi-level columns
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
columns = pd.MultiIndex.from_arrays(arrays, names=['upper', 'lower'])
data = [[1, 2, 3, 4], [5, 6, 7, 8]]
df = pd.DataFrame(data, columns=columns)

print('Original DataFrame with MultiIndex columns:')
print(df)

# Flatten the columns by joining levels with underscore
s_flat = ['_'.join(col) for col in df.columns]
df.columns = s_flat

print('\nFlattened DataFrame columns:')
print(df)
Output
Original DataFrame with MultiIndex columns: upper A A B B lower one two one two 0 1 2 3 4 1 5 6 7 8 Flattened DataFrame columns: A_one A_two B_one B_two 0 1 2 3 4 1 5 6 7 8
⚠️

Common Pitfalls

One common mistake is trying to flatten columns without checking if they are multi-level. If columns are already single-level strings, joining them can cause errors or unexpected results.

Also, forgetting to assign the flattened columns back to df.columns means the DataFrame stays unchanged.

python
import pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Wrong: trying to join single-level columns (strings) as if they were tuples
try:
    df.columns = ['_'.join(col) for col in df.columns]
except TypeError as e:
    print(f'Error: {e}')

# Right: check if columns are MultiIndex before flattening
if isinstance(df.columns, pd.MultiIndex):
    df.columns = ['_'.join(col) for col in df.columns]
else:
    print('Columns are already single-level, no flattening needed.')
Output
Error: can only join an iterable Columns are already single-level, no flattening needed.
📊

Quick Reference

Here is a quick summary of methods to flatten DataFrame columns:

MethodDescriptionExample
Join levels with underscoreConcatenate multi-level column names into single stringsdf.columns = ['_'.join(col) for col in df.columns]
Use to_flat_index()Convert MultiIndex columns to flat Index of tuplesdf.columns = df.columns.to_flat_index()
Convert tuples to stringsMap tuples to strings with custom separatordf.columns = df.columns.map(lambda x: '_'.join(x))

Key Takeaways

Flatten multi-level pandas DataFrame columns by joining the levels into single strings.
Always assign the flattened column names back to df.columns to update the DataFrame.
Check if columns are MultiIndex before flattening to avoid errors.
Use df.columns.to_flat_index() to convert MultiIndex to flat tuples if preferred.
Flattened columns make data easier to access and analyze in pandas.