Aggregation functions help us find simple summaries of data, like totals, averages, or how spread out numbers are.
0
0
Aggregation functions (sum, mean, std) in Data Analysis Python
Introduction
When you want to find the total sales from a list of daily sales numbers.
When you need to calculate the average temperature over a week.
When checking how much variation there is in students' test scores.
When summarizing data to understand overall trends quickly.
When preparing data for reports or visualizations.
Syntax
Data Analysis Python
data.sum()
data.mean()
data.std()These functions are often used on columns of data in tables (like pandas DataFrames or Series).
They return a single number summarizing the data.
Examples
This adds all numbers: 10 + 20 + 30 + 40 = 100.
Data Analysis Python
import pandas as pd numbers = pd.Series([10, 20, 30, 40]) print(numbers.sum())
This calculates the average: (10 + 20 + 30 + 40) / 4 = 25.
Data Analysis Python
import pandas as pd numbers = pd.Series([10, 20, 30, 40]) print(numbers.mean())
This finds how spread out the numbers are around the average.
Data Analysis Python
import pandas as pd numbers = pd.Series([10, 20, 30, 40]) print(numbers.std())
Sample Program
This program shows how to use sum, mean, and std to summarize daily sales data.
Data Analysis Python
import pandas as pd # Create a small dataset of daily sales sales = pd.Series([100, 150, 200, 130, 170]) # Calculate total sales total_sales = sales.sum() # Calculate average sales average_sales = sales.mean() # Calculate sales standard deviation sales_std = sales.std() print(f"Total sales: {total_sales}") print(f"Average sales: {average_sales}") print(f"Sales standard deviation: {sales_std:.2f}")
OutputSuccess
Important Notes
Standard deviation (std) tells you how much the numbers differ from the average.
If your data has missing values, these functions usually ignore them automatically.
You can use these functions on columns in tables to summarize each column.
Summary
Aggregation functions give quick summaries like total, average, and spread.
They help understand data without looking at every number.
sum(), mean(), and std() are common and easy to use in data analysis.