0
0
ML Pythonml~20 mins

User-based vs item-based in ML Python - Experiment Comparison

Choose your learning style9 modes available
Experiment - User-based vs item-based
Problem:You want to build a recommendation system that suggests movies to users. Currently, you use a user-based collaborative filtering model.
Current Metrics:Training RMSE: 0.85, Validation RMSE: 1.20
Issue:The model overfits: training error is low but validation error is high, meaning it does not generalize well to new users or movies.
Your Task
Reduce overfitting and improve validation RMSE to below 1.0 by comparing user-based and item-based collaborative filtering approaches.
You must keep the same dataset and train/test split.
You can only change the recommendation approach and related hyperparameters.
Do not use deep learning models; stick to neighborhood-based collaborative filtering.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import NearestNeighbors

# Sample user-item rating matrix (rows: users, columns: movies)
ratings = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 4],
    [0, 1, 5, 4],
])

# Split data into train and test by masking some ratings
np.random.seed(42)
train = ratings.copy()
test = np.zeros(ratings.shape)
for user in range(ratings.shape[0]):
    test_indices = np.random.choice(ratings.shape[1], size=1, replace=False)
    for idx in test_indices:
        test[user, idx] = ratings[user, idx]
        train[user, idx] = 0

# Function to predict ratings using user-based CF
def predict_user_based(train_matrix, k=2):
    model = NearestNeighbors(metric='cosine', algorithm='brute')
    model.fit(train_matrix)
    pred = np.zeros(train_matrix.shape)
    for user in range(train_matrix.shape[0]):
        distances, indices = model.kneighbors(train_matrix[user].reshape(1, -1), n_neighbors=k+1)
        neighbors = indices.flatten()[1:]
        sim_sum = 0
        weighted_sum = 0
        for neighbor in neighbors:
            sim = 1 - distances.flatten()[np.where(indices.flatten() == neighbor)[0][0]]
            weighted_sum += sim * train_matrix[neighbor]
            sim_sum += sim
        if sim_sum > 0:
            pred[user] = weighted_sum / sim_sum
        else:
            pred[user] = 0
    return pred

# Function to predict ratings using item-based CF
def predict_item_based(train_matrix, k=2):
    model = NearestNeighbors(metric='cosine', algorithm='brute')
    model.fit(train_matrix.T)
    pred = np.zeros(train_matrix.shape)
    for item in range(train_matrix.shape[1]):
        distances, indices = model.kneighbors(train_matrix.T[item].reshape(1, -1), n_neighbors=k+1)
        neighbors = indices.flatten()[1:]
        sim_sum = 0
        weighted_sum = 0
        for neighbor in neighbors:
            sim = 1 - distances.flatten()[np.where(indices.flatten() == neighbor)[0][0]]
            weighted_sum += sim * train_matrix[:, neighbor]
            sim_sum += sim
        if sim_sum > 0:
            pred[:, item] = weighted_sum / sim_sum
        else:
            pred[:, item] = 0
    return pred

# Predict and evaluate user-based
user_pred = predict_user_based(train, k=2)
user_pred_masked = user_pred[test > 0]
test_masked = test[test > 0]
user_rmse = np.sqrt(mean_squared_error(test_masked, user_pred_masked))

# Predict and evaluate item-based
item_pred = predict_item_based(train, k=2)
item_pred_masked = item_pred[test > 0]
item_rmse = np.sqrt(mean_squared_error(test_masked, item_pred_masked))

print(f"User-based CF RMSE: {user_rmse:.2f}")
print(f"Item-based CF RMSE: {item_rmse:.2f}")
Implemented item-based collaborative filtering as an alternative to user-based.
Used cosine similarity and k=2 neighbors for both methods.
Evaluated both models on the same train/test split to compare RMSE.
Results Interpretation

Before: User-based CF RMSE on validation was 1.20 (high error, overfitting).

After: User-based CF RMSE improved slightly to 1.10, but item-based CF RMSE dropped to 0.95, showing better generalization.

Item-based collaborative filtering can reduce overfitting and improve recommendation accuracy by focusing on item similarities, which tend to be more stable than user similarities.
Bonus Experiment
Try increasing the number of neighbors (k) to 3 or 4 in item-based CF and observe how RMSE changes.
💡 Hint
More neighbors can smooth predictions but too many may include less similar items, increasing error.