Python Program to Find Standard Deviation
import math and code like std_dev = math.sqrt(sum((x - mean) ** 2 for x in data) / n).Examples
How to Think About It
Algorithm
Code
import math def standard_deviation(data): n = len(data) mean = sum(data) / n variance = sum((x - mean) ** 2 for x in data) / n return math.sqrt(variance) # Example usage numbers = [2, 4, 4, 4, 5, 5, 7, 9] print(standard_deviation(numbers))
Dry Run
Let's trace the example [2, 4, 4, 4, 5, 5, 7, 9] through the code
Calculate length and mean
n = 8, mean = (2+4+4+4+5+5+7+9)/8 = 5.0
Calculate squared differences
[(2-5)^2=9, (4-5)^2=1, (4-5)^2=1, (4-5)^2=1, (5-5)^2=0, (5-5)^2=0, (7-5)^2=4, (9-5)^2=16]
Calculate variance
variance = (9+1+1+1+0+0+4+16)/8 = 4.0
Calculate standard deviation
std_dev = sqrt(4.0) = 2.0
| Number | Difference from Mean | Squared Difference |
|---|---|---|
| 2 | 2 - 5 = -3 | 9 |
| 4 | 4 - 5 = -1 | 1 |
| 4 | 4 - 5 = -1 | 1 |
| 4 | 4 - 5 = -1 | 1 |
| 5 | 5 - 5 = 0 | 0 |
| 5 | 5 - 5 = 0 | 0 |
| 7 | 7 - 5 = 2 | 4 |
| 9 | 9 - 5 = 4 | 16 |
Why This Works
Step 1: Calculate the mean
The mean is the average value, found by adding all numbers and dividing by the count using sum(data) / n.
Step 2: Find squared differences
Each number's difference from the mean is squared to remove negative signs and emphasize larger differences using (x - mean) ** 2.
Step 3: Calculate variance and standard deviation
Variance is the average of squared differences, and standard deviation is the square root of variance using math.sqrt(variance) to return to original units.
Alternative Approaches
import statistics numbers = [2, 4, 4, 4, 5, 5, 7, 9] print(statistics.pstdev(numbers))
import numpy as np numbers = np.array([2, 4, 4, 4, 5, 5, 7, 9]) print(np.std(numbers))
Complexity: O(n) time, O(1) space
Time Complexity
The program loops through the data twice: once to calculate the mean and once to calculate squared differences, so it runs in linear time O(n).
Space Complexity
The program uses a fixed amount of extra space regardless of input size, so space complexity is O(1).
Which Approach is Fastest?
Using the built-in statistics.pstdev() or numpy's np.std() is fastest for large data due to optimized C implementations.
| Approach | Time | Space | Best For |
|---|---|---|---|
| Manual calculation | O(n) | O(1) | Learning and small data |
| statistics.pstdev() | O(n) | O(1) | Simple, built-in, small to medium data |
| numpy np.std() | O(n) | O(1) | Large data and performance |
statistics.pstdev() for quick standard deviation calculation on population data.