Pivot vs pivot_table pandas: Key Differences and Usage
pivot reshapes data without aggregation and requires unique index/column pairs, while pivot_table can aggregate data with a function like mean or sum when duplicates exist. Use pivot_table for flexible aggregation and pivot for simple reshaping with unique data.Quick Comparison
Here is a quick side-by-side comparison of pivot and pivot_table in pandas.
| Feature | pivot | pivot_table |
|---|---|---|
| Purpose | Reshape data without aggregation | Reshape data with aggregation support |
| Handles duplicates | No, raises error if duplicates | Yes, aggregates duplicates |
| Aggregation function | No aggregation function | Supports aggfunc like mean, sum |
| Default aggregation | N/A | Mean |
| Flexibility | Less flexible | More flexible |
| Use case | Simple reshaping with unique pairs | Complex reshaping with duplicates |
Key Differences
The pivot function in pandas is designed for simple reshaping of data where each combination of index and columns is unique. It does not allow duplicate entries for the same index/column pair and will raise an error if duplicates exist. This makes it fast and straightforward but limited to clean data without duplicates.
On the other hand, pivot_table is more powerful and flexible. It can handle duplicate entries by applying an aggregation function such as mean, sum, or any custom function. By default, it uses the mean to aggregate duplicates. This makes pivot_table suitable for summarizing and reshaping data where duplicates or multiple values per group exist.
Additionally, pivot_table supports multiple aggregation functions, margins (totals), and filling missing values, which pivot does not. Therefore, pivot_table is preferred for complex data analysis tasks requiring aggregation, while pivot is best for quick reshaping when data is already unique.
Code Comparison
Example using pivot to reshape data without duplicates.
import pandas as pd data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02'], 'City': ['NY', 'NY', 'LA', 'LA'], 'Temperature': [30, 32, 75, 77]} df = pd.DataFrame(data) pivot_df = df.pivot(index='Date', columns='City', values='Temperature') print(pivot_df)
pivot_table Equivalent
Using pivot_table to handle duplicates by aggregating with mean.
import pandas as pd data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'], 'City': ['NY', 'NY', 'LA', 'LA'], 'Temperature': [30, 31, 75, 77]} df = pd.DataFrame(data) pivot_table_df = df.pivot_table(index='Date', columns='City', values='Temperature', aggfunc='mean') print(pivot_table_df)
When to Use Which
Choose pivot when your data has unique index and column pairs and you want a simple reshaping without aggregation. It is faster and simpler but only works if there are no duplicates.
Choose pivot_table when your data contains duplicates or you want to aggregate values during reshaping. It offers more flexibility with aggregation functions and handling missing data.
Key Takeaways
pivot reshapes data without aggregation and requires unique pairs.pivot_table supports aggregation and handles duplicates gracefully.pivot for simple, clean data reshaping.pivot_table for complex reshaping with aggregation needs.pivot_table is mean but can be customized.