Why time-based analysis reveals trends in Data Analysis Python - Performance Analysis
When we analyze data over time, we want to see how patterns change as more time passes.
We ask: How does the work needed to find trends grow when we have more time points?
Analyze the time complexity of the following code snippet.
import pandas as pd
def calculate_moving_average(data, window_size):
moving_averages = []
for i in range(len(data) - window_size + 1):
window = data[i : i + window_size]
moving_averages.append(sum(window) / window_size)
return moving_averages
# Example usage:
data = [10, 20, 30, 40, 50, 60, 70]
result = calculate_moving_average(data, 3)
This code calculates a moving average over a time series to reveal trends.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping through the data to calculate averages for each window.
- How many times: The loop runs once for each possible window in the data, about n times where n is data length.
As the data size grows, the number of averages to calculate grows roughly the same.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 8 calculations |
| 100 | About 98 calculations |
| 1000 | About 998 calculations |
Pattern observation: The work grows roughly in a straight line as data size increases.
Time Complexity: O(n * k)
This means the time to find trends grows directly with the amount of data points and the window size.
[X] Wrong: "Calculating moving averages takes the same time no matter how much data there is."
[OK] Correct: More data means more windows to average, so the work grows with data size.
Understanding how time-based data grows helps you explain how your analysis scales with more data.
"What if we used a cumulative sum to calculate moving averages instead? How would the time complexity change?"