Prompt Engineering / GenAIml~20 mins

Monitoring and observability in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Monitoring and observability

Problem:You have a machine learning model deployed in production. The model's performance suddenly drops, but you don't know why because there is no monitoring or observability in place.

Current Metrics:No metrics are collected currently, so model accuracy and latency are unknown during deployment.

Issue:Lack of monitoring and observability makes it impossible to detect or diagnose performance issues in real time.

Your Task

Implement monitoring and observability to track model accuracy, latency, and resource usage in production. Set up alerts for performance drops.

Use only open-source tools or libraries.

Do not change the model architecture or training process.

Focus on adding monitoring without impacting model inference speed significantly.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Prompt Engineering / GenAI

import time
import random
from collections import deque

# Simulated model prediction function
def model_predict(input_data):
    # Simulate prediction latency
    time.sleep(random.uniform(0.01, 0.05))
    # Simulate prediction output
    return random.choice([0, 1])

# Monitoring class to track metrics
class ModelMonitor:
    def __init__(self, window_size=100):
        self.predictions = deque(maxlen=window_size)
        self.labels = deque(maxlen=window_size)
        self.latencies = deque(maxlen=window_size)

    def log_prediction(self, prediction, label, latency):
        self.predictions.append(prediction)
        self.labels.append(label)
        self.latencies.append(latency)

    def compute_accuracy(self):
        if not self.labels:
            return None
        correct = sum(p == l for p, l in zip(self.predictions, self.labels))
        return correct / len(self.labels)

    def compute_avg_latency(self):
        if not self.latencies:
            return None
        return sum(self.latencies) / len(self.latencies)

    def alert_if_needed(self):
        accuracy = self.compute_accuracy()
        avg_latency = self.compute_avg_latency()
        if accuracy is not None and accuracy < 0.7:
            print(f"ALERT: Accuracy dropped below threshold: {accuracy:.2f}")
        if avg_latency is not None and avg_latency > 0.04:
            print(f"ALERT: Latency too high: {avg_latency:.3f} seconds")

# Simulate streaming predictions with monitoring
monitor = ModelMonitor(window_size=50)

for i in range(200):
    input_data = i  # dummy input
    true_label = random.choice([0, 1])  # simulated true label

    start_time = time.time()
    pred = model_predict(input_data)
    latency = time.time() - start_time

    monitor.log_prediction(pred, true_label, latency)

    if i % 20 == 0 and i > 0:
        acc = monitor.compute_accuracy()
        avg_lat = monitor.compute_avg_latency()
        print(f"After {i} predictions - Accuracy: {acc:.2f}, Avg Latency: {avg_lat:.3f} sec")
        monitor.alert_if_needed()

Added a ModelMonitor class to track prediction accuracy and latency over a sliding window.

Logged prediction results and latency for each inference call.

Computed accuracy and average latency periodically.

Added alerts to notify when accuracy drops below 70% or latency exceeds 0.04 seconds.

Results Interpretation

Before: No metrics collected, no visibility into model performance.

After: Accuracy and latency metrics are tracked and printed every 20 predictions. Alerts notify when performance degrades.

Adding monitoring and observability allows you to detect and respond to model performance issues in production, improving reliability and trust.

Bonus Experiment

Extend monitoring to include input data distribution tracking to detect data drift.

💡 Hint

Calculate simple statistics like mean and variance of input features over time and alert if they change significantly.

Practice

(1/5)

1. What is the main purpose of monitoring in a software system?

easy

A. To check if the system is working right now

B. To predict future system failures

C. To change system configurations automatically

D. To write new features for the system

Monitoring and observability in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand monitoring's role

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Identify monitoring tools

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand the query meaning

Step 2: Interpret the comparison

Final Answer:

Quick Check:

Solution

Step 1: Analyze the error message

Step 2: Rule out other causes

Final Answer:

Quick Check:

Solution

Step 1: Understand observability and tracing

Step 2: Evaluate options for observability

Final Answer:

Quick Check: