Pandas How to Convert Long to Wide Format Easily
df.pivot(index='row_id', columns='variable', values='value') or df.pivot_table(index='row_id', columns='variable', values='value') to convert a long pandas DataFrame to wide format.Examples
How to Think About It
Algorithm
Code
import pandas as pd df = pd.DataFrame({ 'id': [1, 1, 2, 2], 'var': ['A', 'B', 'A', 'B'], 'val': [10, 20, 30, 40] }) wide_df = df.pivot(index='id', columns='var', values='val') print(wide_df)
Dry Run
Let's trace the example DataFrame through the pivot operation
Original DataFrame
id: [1,1,2,2], var: ['A','B','A','B'], val: [10,20,30,40]
Set 'id' as index, 'var' as columns, 'val' as values
Create new table with rows 1,2 and columns A,B
Fill values
For id=1,var=A put 10; id=1,var=B put 20; id=2,var=A put 30; id=2,var=B put 40
| id | var | val |
|---|---|---|
| 1 | A | 10 |
| 1 | B | 20 |
| 2 | A | 30 |
| 2 | B | 40 |
Why This Works
Step 1: Choosing index, columns, and values
The index parameter sets the rows, columns sets the new columns, and values fills the cells.
Step 2: Using pivot
pivot reshapes the DataFrame by spreading unique column values wide.
Step 3: Handling duplicates
If duplicates exist for the same index and column, pivot fails; use pivot_table with an aggregation function like sum.
Alternative Approaches
import pandas as pd df = pd.DataFrame({ 'id': [1, 1, 2], 'var': ['A', 'A', 'B'], 'val': [5, 10, 15] }) wide_df = df.pivot_table(index='id', columns='var', values='val', aggfunc='sum') print(wide_df)
import pandas as pd df = pd.DataFrame({ 'id': [1, 1, 2, 2], 'var': ['A', 'B', 'A', 'B'], 'val': [10, 20, 30, 40] }) wide_df = df.set_index(['id', 'var'])['val'].unstack() print(wide_df)
Complexity: O(n) time, O(n) space
Time Complexity
The operation scans all rows once to rearrange data, so it is linear in the number of rows.
Space Complexity
A new DataFrame is created with potentially more columns, so space grows with unique values in the pivot column.
Which Approach is Fastest?
pivot is faster but requires no duplicates; pivot_table is more flexible but slightly slower due to aggregation.
| Approach | Time | Space | Best For |
|---|---|---|---|
| pivot | O(n) | O(n) | No duplicates, simple reshaping |
| pivot_table | O(n) | O(n) | Duplicates present, need aggregation |
| set_index + unstack | O(n) | O(n) | Multi-index reshaping, flexible |
pivot_table instead of pivot if your data has duplicate entries for the same index and column.pivot on data with duplicates causes an error; always check for duplicates first.