PandasHow-ToBeginner · 2 min read

Pandas How to Convert Long to Wide Format Easily

Use df.pivot(index='row_id', columns='variable', values='value') or df.pivot_table(index='row_id', columns='variable', values='value') to convert a long pandas DataFrame to wide format.

📋

Examples

Inputdf = pd.DataFrame({'id': [1,1,2,2], 'var': ['A','B','A','B'], 'val': [10,20,30,40]})

Outputpivoted DataFrame with index 1,2 and columns A,B showing values 10,20 and 30,40

Inputdf = pd.DataFrame({'date': ['2023-01-01','2023-01-01','2023-01-02'], 'type': ['temp','humidity','temp'], 'value': [22,30,25]})

Outputwide DataFrame with dates as index and columns temp, humidity with values 22,30 and 25, NaN

Inputdf = pd.DataFrame({'id': [1,1,2], 'var': ['A','A','B'], 'val': [5,10,15]})

Outputpivot_table used to aggregate duplicate entries for id=1,var=A with sum 15

🧠

How to Think About It

To convert long to wide format, think of spreading unique values from one column into new columns. Use one column as the new row index, another as new columns, and a third for the cell values. If duplicates exist, use aggregation to combine them.

📐

Algorithm

Identify the column to use as the new row index.

Identify the column whose unique values will become new columns.

Identify the column with values to fill the new table cells.

Use pivot or pivot_table to reshape the DataFrame accordingly.

If duplicates exist for the same index and column, use pivot_table with an aggregation function.

💻

Code

pandas

import pandas as pd

df = pd.DataFrame({
    'id': [1, 1, 2, 2],
    'var': ['A', 'B', 'A', 'B'],
    'val': [10, 20, 30, 40]
})

wide_df = df.pivot(index='id', columns='var', values='val')
print(wide_df)

Output

var A B id 1 10 20 2 30 40

🔍

Dry Run

Let's trace the example DataFrame through the pivot operation

Original DataFrame

id: [1,1,2,2], var: ['A','B','A','B'], val: [10,20,30,40]

Set 'id' as index, 'var' as columns, 'val' as values

Create new table with rows 1,2 and columns A,B

Fill values

For id=1,var=A put 10; id=1,var=B put 20; id=2,var=A put 30; id=2,var=B put 40

id	var	val
1	A	10
1	B	20
2	A	30
2	B	40

💡

Why This Works

Step 1: Choosing index, columns, and values

The index parameter sets the rows, columns sets the new columns, and values fills the cells.

Step 2: Using pivot

pivot reshapes the DataFrame by spreading unique column values wide.

Step 3: Handling duplicates

If duplicates exist for the same index and column, pivot fails; use pivot_table with an aggregation function like sum.

🔄

Alternative Approaches

pivot_table with aggregation

pandas

import pandas as pd

df = pd.DataFrame({
    'id': [1, 1, 2],
    'var': ['A', 'A', 'B'],
    'val': [5, 10, 15]
})

wide_df = df.pivot_table(index='id', columns='var', values='val', aggfunc='sum')
print(wide_df)

Use this when duplicates exist for the same index and column; it sums values.

set_index and unstack

pandas

import pandas as pd

df = pd.DataFrame({
    'id': [1, 1, 2, 2],
    'var': ['A', 'B', 'A', 'B'],
    'val': [10, 20, 30, 40]
})

wide_df = df.set_index(['id', 'var'])['val'].unstack()
print(wide_df)

This method also reshapes data but requires setting a multi-index first.

⚡

Complexity: O(n) time, O(n) space

Time Complexity

The operation scans all rows once to rearrange data, so it is linear in the number of rows.

Space Complexity

A new DataFrame is created with potentially more columns, so space grows with unique values in the pivot column.

Which Approach is Fastest?

pivot is faster but requires no duplicates; pivot_table is more flexible but slightly slower due to aggregation.

Approach	Time	Space	Best For
pivot	O(n)	O(n)	No duplicates, simple reshaping
pivot_table	O(n)	O(n)	Duplicates present, need aggregation
set_index + unstack	O(n)	O(n)	Multi-index reshaping, flexible

💡

Use pivot_table instead of pivot if your data has duplicate entries for the same index and column.

⚠️

Trying to use pivot on data with duplicates causes an error; always check for duplicates first.