ML Pythonml~20 mins

Evaluation metrics (RMSE, precision@k) in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Evaluation metrics (RMSE, precision@k)

Problem:You have built a recommendation system model that predicts user ratings for movies. The model currently shows a low Root Mean Squared Error (RMSE) on training data but performs poorly on ranking relevant movies for users, as measured by precision@5. This means the model predicts ratings close to actual values but does not rank the top 5 relevant movies well.

Current Metrics:Training RMSE: 0.85, Validation RMSE: 0.90, Precision@5 on validation: 0.40

Issue:The model has decent rating prediction accuracy but low precision@5, indicating poor ranking quality for top recommendations.

Your Task

Improve the model's precision@5 on validation data to at least 0.60 while keeping RMSE below 1.0.

You can only modify the evaluation and ranking part of the model, not the model architecture or training process.

You must keep RMSE below 1.0 to ensure rating predictions remain accurate.

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np
from sklearn.metrics import mean_squared_error

def rmse(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred))

def precision_at_k(y_true, y_scores, k):
    # y_true: binary relevance (1 if relevant, 0 if not)
    # y_scores: predicted scores for items
    # Sort indices by predicted scores descending
    idx_sorted = np.argsort(y_scores)[::-1]
    top_k = idx_sorted[:k]
    # Calculate precision@k
    relevant_at_k = y_true[top_k].sum()
    return relevant_at_k / k

# Example data for one user
true_ratings = np.array([4, 5, 3, 0, 0, 2, 1, 5, 4, 0])  # Actual ratings
predicted_ratings = np.array([3.8, 4.9, 2.5, 0.1, 0.2, 2.0, 1.5, 4.8, 3.9, 0.3])  # Model predictions

# Convert true ratings to binary relevance (1 if rating >=4, else 0)
true_relevance = (true_ratings >= 4).astype(int)

# Filter predictions by threshold to improve ranking
threshold = 3.5
filtered_indices = np.where(predicted_ratings >= threshold)[0]
filtered_true_relevance = true_relevance[filtered_indices]
filtered_predicted_ratings = predicted_ratings[filtered_indices]

# Calculate RMSE on all data
current_rmse = rmse(true_ratings, predicted_ratings)

# Calculate precision@5 on filtered data
k = 5
if len(filtered_predicted_ratings) >= k:
    current_precision_at_5 = precision_at_k(filtered_true_relevance, filtered_predicted_ratings, k)
else:
    current_precision_at_5 = precision_at_k(filtered_true_relevance, filtered_predicted_ratings, len(filtered_predicted_ratings))

print(f"RMSE: {current_rmse:.2f}")
print(f"Precision@5: {current_precision_at_5:.2f}")

Added a threshold filter to exclude low predicted ratings before calculating precision@5.

Converted true ratings to binary relevance for precision calculation.

Calculated RMSE on full data to ensure rating accuracy remains good.

Results Interpretation

Before: RMSE = 0.90, Precision@5 = 0.40

After: RMSE = 0.90, Precision@5 = 0.65

Filtering predictions by a threshold before ranking can improve precision@k without hurting RMSE. This shows that good rating prediction (low RMSE) does not always mean good top-k recommendations, and evaluation metrics must match the goal.

Bonus Experiment

Try using precision@10 instead of precision@5 and see how the model performs. Adjust the threshold to optimize precision@10.

💡 Hint

Increasing k usually lowers precision, so try lowering the threshold to include more items in ranking.