PandasComparisonBeginner · 3 min read

Pivot vs pivot_table pandas: Key Differences and Usage

In pandas, pivot reshapes data without aggregation and requires unique index/column pairs, while pivot_table can aggregate data with a function like mean or sum when duplicates exist. Use pivot_table for flexible aggregation and pivot for simple reshaping with unique data.

⚖️

Quick Comparison

Here is a quick side-by-side comparison of pivot and pivot_table in pandas.

Feature	pivot	pivot_table
Purpose	Reshape data without aggregation	Reshape data with aggregation support
Handles duplicates	No, raises error if duplicates	Yes, aggregates duplicates
Aggregation function	No aggregation function	Supports aggfunc like mean, sum
Default aggregation	N/A	Mean
Flexibility	Less flexible	More flexible
Use case	Simple reshaping with unique pairs	Complex reshaping with duplicates

⚖️

Key Differences

The pivot function in pandas is designed for simple reshaping of data where each combination of index and columns is unique. It does not allow duplicate entries for the same index/column pair and will raise an error if duplicates exist. This makes it fast and straightforward but limited to clean data without duplicates.

On the other hand, pivot_table is more powerful and flexible. It can handle duplicate entries by applying an aggregation function such as mean, sum, or any custom function. By default, it uses the mean to aggregate duplicates. This makes pivot_table suitable for summarizing and reshaping data where duplicates or multiple values per group exist.

Additionally, pivot_table supports multiple aggregation functions, margins (totals), and filling missing values, which pivot does not. Therefore, pivot_table is preferred for complex data analysis tasks requiring aggregation, while pivot is best for quick reshaping when data is already unique.

⚖️

Code Comparison

Example using pivot to reshape data without duplicates.

python

import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02'],
        'City': ['NY', 'NY', 'LA', 'LA'],
        'Temperature': [30, 32, 75, 77]}
df = pd.DataFrame(data)

pivot_df = df.pivot(index='Date', columns='City', values='Temperature')
print(pivot_df)

Output

City LA NY Date 2023-01-01 75 30 2023-01-02 77 32

↔️

pivot_table Equivalent

Using pivot_table to handle duplicates by aggregating with mean.

python

import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['NY', 'NY', 'LA', 'LA'],
        'Temperature': [30, 31, 75, 77]}
df = pd.DataFrame(data)

pivot_table_df = df.pivot_table(index='Date', columns='City', values='Temperature', aggfunc='mean')
print(pivot_table_df)

Output

City LA NY Date 2023-01-01 NaN 30.5 2023-01-02 76.0 NaN

🎯

When to Use Which

Choose pivot when your data has unique index and column pairs and you want a simple reshaping without aggregation. It is faster and simpler but only works if there are no duplicates.

Choose pivot_table when your data contains duplicates or you want to aggregate values during reshaping. It offers more flexibility with aggregation functions and handling missing data.

✅

Key Takeaways

pivot reshapes data without aggregation and requires unique pairs.

pivot_table supports aggregation and handles duplicates gracefully.

Use pivot for simple, clean data reshaping.

Use pivot_table for complex reshaping with aggregation needs.

Default aggregation in pivot_table is mean but can be customized.