0
0
PandasComparisonBeginner · 3 min read

Pivot vs pivot_table pandas: Key Differences and Usage

In pandas, pivot reshapes data without aggregation and requires unique index/column pairs, while pivot_table can aggregate data with a function like mean or sum when duplicates exist. Use pivot_table for flexible aggregation and pivot for simple reshaping with unique data.
⚖️

Quick Comparison

Here is a quick side-by-side comparison of pivot and pivot_table in pandas.

Featurepivotpivot_table
PurposeReshape data without aggregationReshape data with aggregation support
Handles duplicatesNo, raises error if duplicatesYes, aggregates duplicates
Aggregation functionNo aggregation functionSupports aggfunc like mean, sum
Default aggregationN/AMean
FlexibilityLess flexibleMore flexible
Use caseSimple reshaping with unique pairsComplex reshaping with duplicates
⚖️

Key Differences

The pivot function in pandas is designed for simple reshaping of data where each combination of index and columns is unique. It does not allow duplicate entries for the same index/column pair and will raise an error if duplicates exist. This makes it fast and straightforward but limited to clean data without duplicates.

On the other hand, pivot_table is more powerful and flexible. It can handle duplicate entries by applying an aggregation function such as mean, sum, or any custom function. By default, it uses the mean to aggregate duplicates. This makes pivot_table suitable for summarizing and reshaping data where duplicates or multiple values per group exist.

Additionally, pivot_table supports multiple aggregation functions, margins (totals), and filling missing values, which pivot does not. Therefore, pivot_table is preferred for complex data analysis tasks requiring aggregation, while pivot is best for quick reshaping when data is already unique.

⚖️

Code Comparison

Example using pivot to reshape data without duplicates.

python
import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02'],
        'City': ['NY', 'NY', 'LA', 'LA'],
        'Temperature': [30, 32, 75, 77]}
df = pd.DataFrame(data)

pivot_df = df.pivot(index='Date', columns='City', values='Temperature')
print(pivot_df)
Output
City LA NY Date 2023-01-01 75 30 2023-01-02 77 32
↔️

pivot_table Equivalent

Using pivot_table to handle duplicates by aggregating with mean.

python
import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['NY', 'NY', 'LA', 'LA'],
        'Temperature': [30, 31, 75, 77]}
df = pd.DataFrame(data)

pivot_table_df = df.pivot_table(index='Date', columns='City', values='Temperature', aggfunc='mean')
print(pivot_table_df)
Output
City LA NY Date 2023-01-01 NaN 30.5 2023-01-02 76.0 NaN
🎯

When to Use Which

Choose pivot when your data has unique index and column pairs and you want a simple reshaping without aggregation. It is faster and simpler but only works if there are no duplicates.

Choose pivot_table when your data contains duplicates or you want to aggregate values during reshaping. It offers more flexibility with aggregation functions and handling missing data.

Key Takeaways

pivot reshapes data without aggregation and requires unique pairs.
pivot_table supports aggregation and handles duplicates gracefully.
Use pivot for simple, clean data reshaping.
Use pivot_table for complex reshaping with aggregation needs.
Default aggregation in pivot_table is mean but can be customized.