Descriptive statistics review in Data Analysis Python - Time & Space Complexity
We want to understand how the time to calculate descriptive statistics changes as the data size grows.
How does the work increase when we have more data points?
Analyze the time complexity of the following code snippet.
import pandas as pd
def describe_data(df):
mean_val = df['value'].mean()
median_val = df['value'].median()
std_val = df['value'].std()
count_val = df['value'].count()
return mean_val, median_val, std_val, count_val
This code calculates basic descriptive statistics (mean, median, standard deviation, count) on one column of a data table.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning through the column values to compute each statistic.
- How many times: Each statistic requires going through the data once or a few times.
As the number of data points increases, the time to compute each statistic grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations per statistic |
| 100 | About 100 operations per statistic |
| 1000 | About 1000 operations per statistic |
Pattern observation: The work grows linearly with the number of data points.
Time Complexity: O(n)
This means the time to calculate descriptive statistics grows directly with the size of the data.
[X] Wrong: "Calculating mean and median takes the same time regardless of data size."
[OK] Correct: Both mean and median require looking at the data values, so more data means more work.
Understanding how descriptive statistics scale helps you explain data processing steps clearly and shows you can think about efficiency in real tasks.
"What if we added sorting to find the median? How would the time complexity change?"