How to reshape data python

Data-analysis-pythonHow-ToBeginner · 3 min read

How to Reshape Data in Python: Simple Guide with Examples

To reshape data in Python, use the pandas library with methods like pivot(), melt(), and stack(). These functions let you change the layout of your data frames easily by converting between wide and long formats or rearranging rows and columns.

📐

Syntax

Here are common pandas methods to reshape data:

pivot(index, columns, values): Converts long data to wide format.
melt(id_vars, value_vars): Converts wide data to long format.
stack(): Moves columns to rows, creating a multi-level index.
unstack(): Moves rows to columns, the opposite of stack().

python

import pandas as pd

# pivot syntax
# df.pivot(index='row_id', columns='column_id', values='value_column')

# melt syntax
# pd.melt(df, id_vars=['id_columns'], value_vars=['value_columns'])

# stack syntax
# df.stack()

# unstack syntax
# df.unstack()

💻

Example

This example shows how to reshape a simple data frame from long to wide format using pivot() and back to long format using melt().

python

import pandas as pd

data = {
    'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'],
    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
    'Temperature': [30, 60, 28, 65]
}
df = pd.DataFrame(data)

# Reshape from long to wide format
wide_df = df.pivot(index='Date', columns='City', values='Temperature')
print('Wide format:')
print(wide_df)

# Reshape back to long format
long_df = wide_df.reset_index().melt(id_vars='Date', value_vars=['New York', 'Los Angeles'], var_name='City', value_name='Temperature')
print('\nLong format:')
print(long_df)

Output

Wide format: City Los Angeles New York Date 2024-01-01 60 30 2024-01-02 65 28 Long format: Date City Temperature 0 2024-01-01 Los Angeles 60 1 2024-01-02 Los Angeles 65 2 2024-01-01 New York 30 3 2024-01-02 New York 28

⚠️

Common Pitfalls

Common mistakes when reshaping data include:

Using pivot() when the data has duplicate entries for the same index and column combination, which causes errors.
Not resetting the index before using melt(), which can lead to unexpected columns.
Confusing stack() and unstack() usage, leading to wrong data shapes.

Always check your data for duplicates and understand the shape you want before reshaping.

python

import pandas as pd

data = {
    'Date': ['2024-01-01', '2024-01-01', '2024-01-01'],
    'City': ['New York', 'New York', 'Los Angeles'],
    'Temperature': [30, 32, 60]
}
df = pd.DataFrame(data)

# This will raise an error because of duplicate Date and City
try:
    df.pivot(index='Date', columns='City', values='Temperature')
except ValueError as e:
    print('Error:', e)

# Correct approach: use pivot_table with aggregation
pivot_table = df.pivot_table(index='Date', columns='City', values='Temperature', aggfunc='mean')
print('\nPivot table with aggregation:')
print(pivot_table)

Output

Error: Index contains duplicate entries, cannot reshape Pivot table with aggregation: City Los Angeles New York Date 2024-01-01 60.0 31.0

📊

Quick Reference

Method	Purpose	Key Parameters
pivot()	Convert long to wide format	index, columns, values
melt()	Convert wide to long format	id_vars, value_vars
stack()	Move columns to rows	None
unstack()	Move rows to columns	level (optional)
pivot_table()	Pivot with aggregation for duplicates	index, columns, values, aggfunc

✅

Key Takeaways

Use pandas methods like pivot(), melt(), stack(), and unstack() to reshape data frames.

pivot() requires unique index-column pairs; use pivot_table() with aggfunc for duplicates.

melt() converts wide data to long format and often needs id_vars to keep fixed columns.

Check your data shape before and after reshaping to avoid confusion.

Reset index when needed to keep data consistent during reshaping.