Challenge - 5 Problems
Training Data Pipeline Automation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate1:30remaining
Key benefit of automating training data pipelines
Which of the following is the primary benefit of automating training data pipelines in machine learning projects?
Attempts:
2 left
💡 Hint
Think about what automation helps with in repetitive tasks.
✗ Incorrect
Automating training data pipelines helps ensure that data preprocessing is consistent and repeatable, which is crucial for reliable model training.
❓ Predict Output
intermediate1:30remaining
Output of data pipeline step code snippet
What is the output of the following Python code simulating a data pipeline step that filters out negative values and scales the rest by 10?
MLOps
data = [-3, 0, 2, 5] processed = [x * 10 for x in data if x >= 0] print(processed)
Attempts:
2 left
💡 Hint
Look at the condition inside the list comprehension.
✗ Incorrect
The code filters out negative numbers (-3) and multiplies the remaining numbers by 10, resulting in [0, 20, 50].
❓ Model Choice
advanced2:00remaining
Choosing a model for automated retraining trigger
You want to automate retraining of a model when new data distribution shifts significantly. Which model monitoring technique best supports this automation?
Attempts:
2 left
💡 Hint
Think about how to detect when new data is different from old data.
✗ Incorrect
Drift detection models monitor changes in input data distribution and can trigger retraining automatically when significant shifts occur.
❓ Hyperparameter
advanced2:00remaining
Hyperparameter to optimize for pipeline latency
In an automated training data pipeline, which hyperparameter adjustment can reduce pipeline latency without significantly harming model quality?
Attempts:
2 left
💡 Hint
Smaller batches can speed up processing but may affect stability.
✗ Incorrect
Decreasing batch size can reduce latency in data preprocessing steps by processing smaller chunks faster, helping pipeline speed without large quality loss.
🔧 Debug
expert2:30remaining
Identifying error in automated data pipeline code
What error does the following Python code raise when running an automated data pipeline step that merges two datasets with mismatched keys?
```python
df1 = {'id': [1, 2], 'value': [10, 20]}
df2 = {'key': [1, 3], 'score': [100, 300]}
import pandas as pd
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
merged = pd.merge(df1, df2, left_on='id', right_on='key')
print(merged)
```
Attempts:
2 left
💡 Hint
Check the column names used in merge keys.
✗ Incorrect
The right DataFrame has column 'key', not 'id', so specifying right_on='id' causes a KeyError.