Practice

(1/5)

1. What is the main purpose of setting an alert threshold in MLOps monitoring?

easy

A. To group multiple alerts into a single notification

B. To specify when a warning or alert should be triggered based on metric values

C. To define the actions taken after an alert is triggered

D. To store historical data of model performance

Solution

Step 1: Understand alert threshold concept
An alert threshold sets a limit on a metric value that, when crossed, triggers an alert.
Step 2: Differentiate from policies and actions
Policies group conditions and actions, but thresholds specifically define when alerts fire.
Final Answer:
To specify when a warning or alert should be triggered based on metric values -> Option B
Quick Check:
Alert threshold = trigger point [OK]

Hint: Thresholds set alert trigger points based on metrics [OK]

Common Mistakes:

Confusing thresholds with alert grouping
Thinking thresholds define actions
Assuming thresholds store data

2. Which of the following is the correct way to define an alert threshold for CPU usage exceeding 80% in a YAML policy?

easy

A. threshold: { metric: 'cpu_usage', operator: '>', value: 80 }

B. threshold: { metric: 'cpu_usage', operator: '<', value: 80 }

C. threshold: { metric: 'cpu_usage', operator: '=', value: 80 }

D. threshold: { metric: 'cpu_usage', operator: '!=', value: 80 }

Solution

Step 1: Identify the correct operator for exceeding 80%
Exceeding means greater than, so operator should be '>'.
Step 2: Match metric and value correctly
Metric is 'cpu_usage' and value is 80, so the syntax matches threshold: { metric: 'cpu_usage', operator: '>', value: 80 }.
Final Answer:
threshold: { metric: 'cpu_usage', operator: '>', value: 80 } -> Option A
Quick Check:
Exceeding 80% means operator '>' [OK]

Hint: Use '>' operator for thresholds exceeding a value [OK]

Common Mistakes:

Using '<' instead of '>' for exceeding
Using '=' which triggers only at exact value
Using '!=' which triggers for all except exact

3. Given this alert policy snippet:

thresholds:
  - metric: 'latency'
    operator: '>'
    value: 200
actions:
  - notify: 'on-call-team'

What happens when latency reaches 250?

medium

A. The alert triggers but no notification is sent

B. No alert is triggered because 250 is less than 200

C. An alert is triggered and the on-call team is notified

D. The system ignores latency metric

Solution

Step 1: Analyze threshold condition
The threshold triggers when latency > 200. Since 250 > 200, condition is met.
Step 2: Check actions on trigger
Action is to notify 'on-call-team', so notification will be sent.
Final Answer:
An alert is triggered and the on-call team is notified -> Option C
Quick Check:
Latency 250 > 200 triggers alert and notify [OK]

Hint: Check if metric value crosses threshold to trigger alerts [OK]

Common Mistakes:

Misreading operator direction
Ignoring actions linked to alerts
Assuming no notification without explicit command

4. You have this alert policy configuration:

thresholds:
  - metric: 'error_rate'
    operator: '>'
    value: 5
actions:
  - notify: 'dev-team'

But alerts never trigger even when error_rate is 10. What is the likely issue?

medium

A. The operator should be '<' instead of '>'

B. Notifications require a separate enable flag

C. The value 5 is too high to trigger alerts

D. The metric name might be misspelled or mismatched

Solution

Step 1: Verify operator and value logic
Operator '>' with value 5 means alert triggers if error_rate > 5, so 10 should trigger alert.
Step 2: Check metric name correctness
If alerts never trigger, a common cause is metric name mismatch or typo causing no data match.
Final Answer:
The metric name might be misspelled or mismatched -> Option D
Quick Check:
Metric name mismatch blocks alert triggers [OK]

Hint: Check metric names carefully if alerts don't trigger [OK]

Common Mistakes:

Changing operator incorrectly
Assuming threshold value is too high
Forgetting to enable notifications

5. You want to create a policy that triggers an alert if either model accuracy drops below 90% or latency exceeds 300ms. Which configuration correctly defines this combined alert policy?

hard

A. thresholds: - metric: 'accuracy' operator: '<' value: 90 - metric: 'latency' operator: '>' value: 300 actions: - notify: 'ml-team'

B. thresholds: - metric: 'accuracy' operator: '>' value: 90 - metric: 'latency' operator: '<' value: 300 actions: - notify: 'ml-team'

C. thresholds: - metric: 'accuracy' operator: '<' value: 90 condition: 'AND' - metric: 'latency' operator: '>' value: 300 actions: - notify: 'ml-team'

D. thresholds: - metric: 'accuracy' operator: '<' value: 90 condition: 'OR' - metric: 'latency' operator: '>' value: 300 actions: - notify: 'ml-team'

Solution

Step 1: Identify correct operators for conditions
Accuracy below 90% means operator '<', latency exceeding 300 means operator '>'.
Step 2: Understand default logical grouping
Most alert systems treat multiple thresholds as OR by default, so listing both triggers alert if either condition is met.
Step 3: Verify options for logical conditions
Configurations that include a condition key (like 'OR' or 'AND') under a threshold are typically not valid syntax. The configuration using operator '>' for accuracy and '<' for latency has incorrect operators.
Final Answer:
thresholds: - metric: 'accuracy' operator: '<' value: 90 - metric: 'latency' operator: '>' value: 300 actions: - notify: 'ml-team' -> Option A
Quick Check:
Correct operators + default OR logic = thresholds: - metric: 'accuracy' operator: '<' value: 90 - metric: 'latency' operator: '>' value: 300 actions: - notify: 'ml-team' [OK]

Hint: Use correct operators and list thresholds for OR logic [OK]

Common Mistakes:

Using wrong operators for conditions
Adding unsupported 'condition' keys
Assuming AND logic without explicit config

Input Size (policies x thresholds)	Approx. Operations
10 policies x 5 thresholds	50 checks
100 policies x 5 thresholds	500 checks
100 policies x 100 thresholds	10,000 checks

Alert thresholds and policies in MLOps - Time & Space Complexity

Start learning this pattern below

Practice

Solution

Step 1: Understand alert threshold concept

Step 2: Differentiate from policies and actions

Final Answer:

Quick Check:

Solution

Step 1: Identify the correct operator for exceeding 80%

Step 2: Match metric and value correctly

Final Answer:

Quick Check:

Solution

Step 1: Analyze threshold condition

Step 2: Check actions on trigger

Final Answer:

Quick Check:

Solution

Step 1: Verify operator and value logic

Step 2: Check metric name correctness

Final Answer:

Quick Check:

Solution

Step 1: Identify correct operators for conditions

Step 2: Understand default logical grouping

Step 3: Verify options for logical conditions

Final Answer:

Quick Check: