What is the main purpose of regression testing when applied to data science models?
Think about what regression testing means in software and how it applies to models.
Regression testing ensures that recent changes do not break or degrade existing functionality, including model predictions.
Given the following Python code that compares old and new model predictions, what is the output?
old_preds = [0.8, 0.6, 0.9, 0.4] new_preds = [0.8, 0.65, 0.9, 0.4] threshold = 0.05 changes = [abs(o - n) for o, n in zip(old_preds, new_preds)] regression_fail = any(change > threshold for change in changes) print(regression_fail)
Check if any prediction difference exceeds the threshold.
The differences are 0.0, 0.05, 0.0, 0.0. None exceed 0.05, so regression_fail is False.
You have two sets of model evaluation metrics stored in dictionaries. Which option correctly identifies if any metric has regressed (got worse)?
old_metrics = {'accuracy': 0.92, 'precision': 0.85, 'recall': 0.88}
new_metrics = {'accuracy': 0.90, 'precision': 0.87, 'recall': 0.85}
regressions = {k: new_metrics[k] < old_metrics[k] for k in old_metrics}
print(regressions)Compare each metric's new value to the old value to see if it decreased.
Accuracy dropped from 0.92 to 0.90 (True), precision increased (False), recall dropped (True).
What error does the following regression test code raise?
old_results = [0.7, 0.8, 0.9] new_results = [0.7, 0.85] for i in range(len(old_results)): if abs(old_results[i] - new_results[i]) > 0.1: print('Regression detected')
Check if the loop index matches the length of both lists.
new_results has fewer elements than old_results, so accessing new_results[2] causes IndexError.
You update a machine learning model frequently with new data. Which regression testing strategy best ensures that model performance does not degrade over time?
Think about automation and consistent checks for regression.
Automated tests comparing predictions to a baseline help catch unintended degradations quickly and reliably.