0
0
MlopsProgramBeginner · 3 min read

Python ML Program to Recommend Movies using sklearn

Use sklearn's NearestNeighbors to recommend movies by finding similar users or items; for example, fit a model with NearestNeighbors(n_neighbors=2).fit(user_movie_ratings) and predict recommendations with kneighbors().
📋

Examples

Input[[5, 4, 0], [4, 0, 3], [0, 2, 5]]
OutputRecommended movies for user 0: Movie indices [2]
Input[[3, 0, 0], [0, 4, 5], [5, 5, 0]]
OutputRecommended movies for user 1: Movie indices [0, 2]
Input[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
OutputNo recommendations available due to no ratings
🧠

How to Think About It

To recommend movies, we look at user ratings for movies and find users with similar tastes using a distance measure. Then, we suggest movies liked by similar users that the current user hasn't seen. We use sklearn's NearestNeighbors to find these similar users based on rating patterns.
📐

Algorithm

1
Get user-movie rating data as a matrix
2
Fit NearestNeighbors model on this data
3
For a target user, find nearest neighbors (similar users)
4
Collect movies rated highly by neighbors but not by target user
5
Return these movies as recommendations
💻

Code

sklearn
from sklearn.neighbors import NearestNeighbors
import numpy as np

# Sample user-movie ratings matrix (rows: users, cols: movies)
ratings = np.array([
    [5, 4, 0],
    [4, 0, 3],
    [0, 2, 5]
])

model = NearestNeighbors(n_neighbors=2, metric='cosine')
model.fit(ratings)

user_id = 0
# Find 2 nearest neighbors for user 0
distances, indices = model.kneighbors([ratings[user_id]])

# Movies user 0 hasn't rated
user_movies = ratings[user_id]
unrated = np.where(user_movies == 0)[0]

# Recommend movies rated by neighbors but not by user 0
recommendations = set()
for neighbor in indices[0][1:]:  # skip user itself
    neighbor_ratings = ratings[neighbor]
    for movie in unrated:
        if neighbor_ratings[movie] >= 3:  # threshold for liking
            recommendations.add(movie)

print(f"Recommended movies for user {user_id}: Movie indices {sorted(recommendations)}")
Output
Recommended movies for user 0: Movie indices [2]
🔍

Dry Run

Let's trace user 0's recommendations through the code

1

Input ratings matrix

ratings = [[5,4,0],[4,0,3],[0,2,5]]

2

Fit NearestNeighbors model

Model learns user similarity based on cosine distance

3

Find neighbors for user 0

neighbors indices = [0,1], distances = [0.0, 0.18]

4

Identify movies user 0 hasn't rated

unrated movies = [2]

5

Check neighbors' ratings for unrated movies

Neighbor 1 rated movie 2 as 3 (liked), add to recommendations

6

Output recommendations

Recommended movies for user 0: [2]

StepUserNeighborsUnrated MoviesRecommendations
1User 0---
3User 0[0,1]--
4User 0[0,1][2]-
5User 0[1][2][2]
💡

Why This Works

Step 1: Modeling user similarity

We use NearestNeighbors with cosine distance to find users with similar movie rating patterns.

Step 2: Finding neighbors

For a target user, we find closest users who have rated movies similarly using kneighbors().

Step 3: Recommending movies

We recommend movies that neighbors liked (rating >= 3) but the target user hasn't rated yet.

🔄

Alternative Approaches

Collaborative Filtering with Surprise Library
sklearn
from surprise import Dataset, Reader, KNNBasic
import pandas as pd

# Prepare data
ratings_dict = {'userID': ['A', 'A', 'B', 'B', 'C'], 'itemID': ['M1', 'M2', 'M2', 'M3', 'M1'], 'rating': [5, 3, 4, 2, 4]}
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(pd.DataFrame(ratings_dict), reader)
trainset = data.build_full_trainset()

# Use KNNBasic algorithm
algo = KNNBasic(sim_options={'name': 'cosine', 'user_based': True})
algo.fit(trainset)

# Predict rating for user 'A' on movie 'M3'
pred = algo.predict('A', 'M3')
print(f"Predicted rating: {pred.est}")
More specialized for recommendation, handles sparse data well but requires extra library.
Content-Based Filtering
sklearn
import numpy as np

# Movie features (e.g., genre vectors)
movie_features = np.array([[1,0,0], [0,1,0], [0,0,1]])
user_profile = np.array([1,0,0])  # user likes first genre

# Compute similarity
similarities = movie_features.dot(user_profile)
recommendations = np.argsort(-similarities)
print(f"Recommended movies by content: {recommendations}")
Recommends based on movie attributes, not user ratings; simpler but less personalized.

Complexity: O(n^2 * m) time, O(n * m) space

Time Complexity

Finding neighbors compares each user to others, leading to O(n^2) where n is users, multiplied by m movies for distance calculation.

Space Complexity

Storing the user-movie matrix requires O(n * m) space, where n is users and m is movies.

Which Approach is Fastest?

Using sklearn's NearestNeighbors is efficient for small datasets; specialized libraries like Surprise optimize sparse data better.

ApproachTimeSpaceBest For
sklearn NearestNeighborsO(n^2 * m)O(n * m)Small to medium datasets
Surprise Library Collaborative FilteringOptimized for sparse dataSparse storageLarge sparse datasets
Content-Based FilteringO(m)O(m)When user ratings are unavailable
💡
Normalize ratings and choose a similarity metric like cosine for better recommendations.
⚠️
Not filtering out movies the user already rated leads to recommending seen movies again.