0
0
MLOpsdevops~20 mins

Platform observability and SLAs in MLOps - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Platform Observability Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Service Level Agreements (SLAs)

Which statement best describes the purpose of an SLA in platform observability?

AAn SLA defines the expected uptime and performance targets agreed upon between service provider and user.
BAn SLA is a tool used to monitor the internal logs of a platform without user involvement.
CAn SLA is a software that automatically fixes bugs in the platform code.
DAn SLA is a backup system that stores data in case of platform failure.
Attempts:
2 left
💡 Hint

Think about what agreements between users and providers usually include.

💻 Command Output
intermediate
2:00remaining
Interpreting Observability Metrics Output

Given the following Prometheus query output for error rate over 5 minutes, what is the error rate percentage?

error_rate{service="ml-model"} 0.02
A2%
B0.2%
C20%
D0.02%
Attempts:
2 left
💡 Hint

Remember to convert decimal to percentage by multiplying by 100.

🔀 Workflow
advanced
3:00remaining
Setting Up Alerting for SLA Breach

Which sequence of steps correctly sets up an alert in a monitoring system when the platform's uptime falls below 99.9%?

A1,3,2,4
B2,1,3,4
C1,2,3,4
D3,1,2,4
Attempts:
2 left
💡 Hint

Think about logical order: metric first, then alert, then notification, then test.

Troubleshoot
advanced
2:00remaining
Diagnosing Missing Observability Data

You notice that your observability dashboard shows no data for the last hour. Which is the most likely cause?

AThe platform uptime is 100%, so no data is needed.
BThe SLA agreement expired and disabled monitoring.
CThe notification system is down, so data is not displayed.
DThe data collection agent crashed or stopped running.
Attempts:
2 left
💡 Hint

Consider what collects and sends data to the dashboard.

Best Practice
expert
3:00remaining
Choosing Metrics for SLA Compliance Monitoring

Which metric is the best choice to monitor to ensure compliance with a 99.9% uptime SLA for an ML platform?

AAverage response time of the ML model predictions.
BPercentage of successful requests over total requests in a time window.
CCPU usage percentage of the ML model server.
DTotal number of requests processed per minute.
Attempts:
2 left
💡 Hint

Think about what directly reflects availability and success rate.