Bird
Raised Fist0
ML Pythonml~20 mins

Moving averages in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Moving averages
Problem:You want to smooth noisy time series data using moving averages to better see trends.
Current Metrics:Raw data has high variance and noise, making trend detection difficult.
Issue:The data is too noisy, so simple plots are unclear and predictions based on raw data are unstable.
Your Task
Apply moving averages to smooth the time series data and reduce noise while preserving trend information.
Use only simple moving average (SMA) or exponential moving average (EMA).
Do not use complex models or external smoothing libraries.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
import matplotlib.pyplot as plt

# Generate noisy time series data
np.random.seed(42)
time = np.arange(100)
true_signal = np.sin(0.2 * time)  # underlying trend
noise = np.random.normal(0, 0.5, size=time.shape)
noisy_data = true_signal + noise

# Simple Moving Average function
def simple_moving_average(data, window_size):
    return np.convolve(data, np.ones(window_size)/window_size, mode='valid')

# Exponential Moving Average function
def exponential_moving_average(data, alpha):
    ema = [data[0]]
    for point in data[1:]:
        ema.append(alpha * point + (1 - alpha) * ema[-1])
    return np.array(ema)

# Apply SMA with window size 5
sma_5 = simple_moving_average(noisy_data, 5)

# Apply EMA with alpha 0.2
ema_02 = exponential_moving_average(noisy_data, 0.2)

# Plot results
plt.figure(figsize=(10,6))
plt.plot(time, noisy_data, label='Noisy Data', alpha=0.5)
plt.plot(time, true_signal, label='True Signal', linewidth=2)
plt.plot(time[4:], sma_5, label='SMA (window=5)', linewidth=2)
plt.plot(time, ema_02, label='EMA (alpha=0.2)', linewidth=2)
plt.legend()
plt.title('Moving Averages for Smoothing Noisy Time Series')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
Added simple moving average function with adjustable window size.
Added exponential moving average function with smoothing factor alpha.
Applied both methods to noisy data to reduce noise and highlight trend.
Visualized original noisy data, true signal, and smoothed results for comparison.
Results Interpretation

Before: The noisy data fluctuates widely around the true signal, making it hard to see the trend.

After: The moving averages smooth out the noise, showing a clearer trend line closer to the true signal.

Moving averages help reduce noise in time series data, making trends easier to identify without complex models.
Bonus Experiment
Try using different window sizes for SMA and different alpha values for EMA to see how smoothing strength affects trend clarity.
💡 Hint
Larger SMA windows smooth more but may lag behind trends; higher EMA alpha reacts faster but may keep more noise.

Practice

(1/5)
1. What is the main purpose of using a moving average in data analysis?
easy
A. To smooth out short-term fluctuations and highlight longer-term trends
B. To increase the number of data points in a dataset
C. To remove all noise from the data completely
D. To predict exact future values without error

Solution

  1. Step 1: Understand the role of moving averages

    Moving averages smooth data by averaging nearby points, reducing short-term ups and downs.
  2. Step 2: Identify the main goal

    The goal is to reveal longer-term trends by reducing noise, not to remove noise completely or predict exact values.
  3. Final Answer:

    To smooth out short-term fluctuations and highlight longer-term trends -> Option A
  4. Quick Check:

    Moving average = smoothing trends [OK]
Hint: Moving averages smooth data to show trends clearly [OK]
Common Mistakes:
  • Thinking moving averages increase data points
  • Believing moving averages remove all noise
  • Assuming moving averages predict exact future values
2. Which of the following Python code snippets correctly computes a simple moving average with window size 3 for a list data?
easy
A. [data[i] / 3 for i in range(len(data))]
B. [sum(data[i:i+3]) for i in range(len(data)-3)]
C. [sum(data[i:i+3]) / 3 for i in range(len(data)-3)]
D. [(data[i] + data[i+1] + data[i+2]) / 3 for i in range(len(data)-2)]

Solution

  1. Step 1: Understand moving average calculation

    A simple moving average with window 3 averages each group of 3 consecutive elements.
  2. Step 2: Check each option's correctness

    [(data[i] + data[i+1] + data[i+2]) / 3 for i in range(len(data)-2)] correctly sums three consecutive elements and divides by 3, iterating till len(data)-2.
    [sum(data[i:i+3]) for i in range(len(data)-3)] sums but does not divide by 3.
    [sum(data[i:i+3]) / 3 for i in range(len(data)-3)] divides but uses range(len(data)-3), which is too short.
    [data[i] / 3 for i in range(len(data))] divides single elements by 3, not averaging groups.
  3. Final Answer:

    [(data[i] + data[i+1] + data[i+2]) / 3 for i in range(len(data)-2)] -> Option D
  4. Quick Check:

    Sum 3 elements / 3, range correct = [(data[i] + data[i+1] + data[i+2]) / 3 for i in range(len(data)-2)] [OK]
Hint: Sum 3 elements and divide by 3, loop till len-2 [OK]
Common Mistakes:
  • Forgetting to divide by window size
  • Using wrong range length causing index errors
  • Averaging single elements instead of groups
3. Given the code below, what is the output?
data = [2, 4, 6, 8, 10]
window = 2
moving_avg = [sum(data[i:i+window]) / window for i in range(len(data) - window + 1)]
print(moving_avg)
medium
A. [2.0, 4.0, 6.0, 8.0, 10.0]
B. [3.0, 5.0, 7.0]
C. [3.0, 5.0, 7.0, 9.0]
D. [6.0, 8.0, 10.0]

Solution

  1. Step 1: Calculate moving averages manually

    Window size is 2, so average pairs:
    (2+4)/2=3.0
    (4+6)/2=5.0
    (6+8)/2=7.0
    (8+10)/2=9.0
  2. Step 2: Confirm output list length and values

    Length is len(data)-window+1 = 5-2+1=4, matching 4 values above.
  3. Final Answer:

    [3.0, 5.0, 7.0, 9.0] -> Option C
  4. Quick Check:

    Pairs averaged = [3.0, 5.0, 7.0, 9.0] [OK]
Hint: Average pairs sliding by one, length = len - window + 1 [OK]
Common Mistakes:
  • Confusing window size with output length
  • Calculating sums but forgetting to divide
  • Off-by-one errors in range length
4. The following code is intended to compute a moving average with window size 3, but it misses the last window. What is the problem?
data = [1, 2, 3, 4, 5]
window = 3
moving_avg = [sum(data[i:i+window]) / window for i in range(len(data)-window)]
print(moving_avg)
medium
A. The range should be len(data) - window + 1 to include the last window
B. The window size is too large for the data list
C. sum() cannot be used on list slices
D. Division by window size should be outside the list comprehension

Solution

  1. Step 1: Analyze the range length

    Range is len(data)-window = 5-3=2, but to cover all windows it should be len(data)-window+1 = 3.
  2. Step 2: Understand impact of incorrect range

    Using len(data)-window misses the last valid window slice, causing incomplete results.
  3. Final Answer:

    The range should be len(data) - window + 1 to include the last window -> Option A
  4. Quick Check:

    Range length = len - window + 1 [OK]
Hint: Use range(len(data) - window + 1) for full coverage [OK]
Common Mistakes:
  • Using len(data) - window instead of +1
  • Thinking sum() can't handle slices
  • Misplacing division outside comprehension
5. You have daily sales data for 10 days: [10, 12, 11, 14, 13, 15, 16, 14, 13, 12]. You want to smooth this data using a moving average with window size 4 but only want to keep averages where the window's average is greater than 13. Which Python code correctly computes this filtered moving average?
hard
A. [sum(data[i:i+4])/4 for i in range(len(data)-4) if sum(data[i:i+4])/4 > 13]
B. [avg for i in range(len(data)-3) if (avg := sum(data[i:i+4])/4) > 13]
C. [sum(data[i:i+4])/4 for i in range(len(data)-3) if sum(data[i:i+4]) > 13]
D. [sum(data[i:i+4])/4 for i in range(len(data)-3) if sum(data[i:i+4])/4 < 13]

Solution

  1. Step 1: Understand window size and range

    Window size 4 means averaging groups of 4 elements, so range is len(data)-3 = 10-3=7.
  2. Step 2: Filter averages greater than 13

    [avg for i in range(len(data)-3) if (avg := sum(data[i:i+4])/4) > 13] uses assignment expression to compute average once and filter if > 13.
    [sum(data[i:i+4])/4 for i in range(len(data)-4) if sum(data[i:i+4])/4 > 13] uses wrong range (len(data)-4=6), missing last window.
    [sum(data[i:i+4])/4 for i in range(len(data)-3) if sum(data[i:i+4]) > 13] filters sum > 13, not average > 13.
    [sum(data[i:i+4])/4 for i in range(len(data)-3) if sum(data[i:i+4])/4 < 13] filters averages less than 13, opposite condition.
  3. Final Answer:

    [avg for i in range(len(data)-3) if (avg := sum(data[i:i+4])/4) > 13] -> Option B
  4. Quick Check:

    Use assignment expression to filter averages > 13 [OK]
Hint: Use assignment expression (walrus) to filter averages [OK]
Common Mistakes:
  • Using wrong range length missing last windows
  • Filtering sum instead of average
  • Using wrong comparison operator