0
0
MLOpsdevops~20 mins

Training data pipeline automation in MLOps - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Training Data Pipeline Automation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Key benefit of automating training data pipelines
Which of the following is the primary benefit of automating training data pipelines in machine learning projects?
ARemoves the requirement for data versioning
BEnsures consistent and repeatable data preprocessing steps
CAutomatically improves model accuracy without retraining
DEliminates the need for model evaluation
Attempts:
2 left
💡 Hint
Think about what automation helps with in repetitive tasks.
Predict Output
intermediate
1:30remaining
Output of data pipeline step code snippet
What is the output of the following Python code simulating a data pipeline step that filters out negative values and scales the rest by 10?
MLOps
data = [-3, 0, 2, 5]
processed = [x * 10 for x in data if x >= 0]
print(processed)
A[0, 20, 50]
B[-30, 0, 20, 50]
C[0, 2, 5]
D[30, 0, 20, 50]
Attempts:
2 left
💡 Hint
Look at the condition inside the list comprehension.
Model Choice
advanced
2:00remaining
Choosing a model for automated retraining trigger
You want to automate retraining of a model when new data distribution shifts significantly. Which model monitoring technique best supports this automation?
AUse a drift detection model that monitors input feature distribution changes
BUse a model that ignores input data changes and retrains on fixed schedule
CUse a model that only monitors training loss during initial training
DUse a model that retrains only when accuracy on training data improves
Attempts:
2 left
💡 Hint
Think about how to detect when new data is different from old data.
Hyperparameter
advanced
2:00remaining
Hyperparameter to optimize for pipeline latency
In an automated training data pipeline, which hyperparameter adjustment can reduce pipeline latency without significantly harming model quality?
AAdd more complex feature engineering steps
BIncrease number of epochs during model training
CDecrease batch size during data preprocessing
DIncrease model depth to improve accuracy
Attempts:
2 left
💡 Hint
Smaller batches can speed up processing but may affect stability.
🔧 Debug
expert
2:30remaining
Identifying error in automated data pipeline code
What error does the following Python code raise when running an automated data pipeline step that merges two datasets with mismatched keys? ```python df1 = {'id': [1, 2], 'value': [10, 20]} df2 = {'key': [1, 3], 'score': [100, 300]} import pandas as pd df1 = pd.DataFrame(df1) df2 = pd.DataFrame(df2) merged = pd.merge(df1, df2, left_on='id', right_on='key') print(merged) ```
ATypeError: merge() got an unexpected keyword argument 'left_on'
BValueError: columns overlap but no suffix specified
CNo error, prints merged DataFrame
DKeyError: 'id'
Attempts:
2 left
💡 Hint
Check the column names used in merge keys.