MlopsProgramBeginner · 3 min read

Python ML Program to Recommend Movies using sklearn

Use sklearn's NearestNeighbors to recommend movies by finding similar users or items; for example, fit a model with NearestNeighbors(n_neighbors=2).fit(user_movie_ratings) and predict recommendations with kneighbors().

📋

Examples

Input[[5, 4, 0], [4, 0, 3], [0, 2, 5]]

OutputRecommended movies for user 0: Movie indices [2]

Input[[3, 0, 0], [0, 4, 5], [5, 5, 0]]

OutputRecommended movies for user 1: Movie indices [0, 2]

Input[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

OutputNo recommendations available due to no ratings

🧠

How to Think About It

To recommend movies, we look at user ratings for movies and find users with similar tastes using a distance measure. Then, we suggest movies liked by similar users that the current user hasn't seen. We use sklearn's NearestNeighbors to find these similar users based on rating patterns.

📐

Algorithm

Get user-movie rating data as a matrix

Fit NearestNeighbors model on this data

For a target user, find nearest neighbors (similar users)

Collect movies rated highly by neighbors but not by target user

Return these movies as recommendations

💻

Code

sklearn

from sklearn.neighbors import NearestNeighbors
import numpy as np

# Sample user-movie ratings matrix (rows: users, cols: movies)
ratings = np.array([
    [5, 4, 0],
    [4, 0, 3],
    [0, 2, 5]
])

model = NearestNeighbors(n_neighbors=2, metric='cosine')
model.fit(ratings)

user_id = 0
# Find 2 nearest neighbors for user 0
distances, indices = model.kneighbors([ratings[user_id]])

# Movies user 0 hasn't rated
user_movies = ratings[user_id]
unrated = np.where(user_movies == 0)[0]

# Recommend movies rated by neighbors but not by user 0
recommendations = set()
for neighbor in indices[0][1:]:  # skip user itself
    neighbor_ratings = ratings[neighbor]
    for movie in unrated:
        if neighbor_ratings[movie] >= 3:  # threshold for liking
            recommendations.add(movie)

print(f"Recommended movies for user {user_id}: Movie indices {sorted(recommendations)}")

Output

Recommended movies for user 0: Movie indices [2]

🔍

Dry Run

Let's trace user 0's recommendations through the code

Input ratings matrix

ratings = [[5,4,0],[4,0,3],[0,2,5]]

Fit NearestNeighbors model

Model learns user similarity based on cosine distance

Find neighbors for user 0

neighbors indices = [0,1], distances = [0.0, 0.18]

Identify movies user 0 hasn't rated

unrated movies = [2]

Check neighbors' ratings for unrated movies

Neighbor 1 rated movie 2 as 3 (liked), add to recommendations

Output recommendations

Recommended movies for user 0: [2]

Step	User	Neighbors	Unrated Movies	Recommendations
1	User 0	-	-	-
3	User 0	[0,1]	-	-
4	User 0	[0,1]	[2]	-
5	User 0	[1]	[2]	[2]

💡

Why This Works

Step 1: Modeling user similarity

We use NearestNeighbors with cosine distance to find users with similar movie rating patterns.

Step 2: Finding neighbors

For a target user, we find closest users who have rated movies similarly using kneighbors().

Step 3: Recommending movies

We recommend movies that neighbors liked (rating >= 3) but the target user hasn't rated yet.

🔄

Alternative Approaches

Collaborative Filtering with Surprise Library

sklearn

from surprise import Dataset, Reader, KNNBasic
import pandas as pd

# Prepare data
ratings_dict = {'userID': ['A', 'A', 'B', 'B', 'C'], 'itemID': ['M1', 'M2', 'M2', 'M3', 'M1'], 'rating': [5, 3, 4, 2, 4]}
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(pd.DataFrame(ratings_dict), reader)
trainset = data.build_full_trainset()

# Use KNNBasic algorithm
algo = KNNBasic(sim_options={'name': 'cosine', 'user_based': True})
algo.fit(trainset)

# Predict rating for user 'A' on movie 'M3'
pred = algo.predict('A', 'M3')
print(f"Predicted rating: {pred.est}")

More specialized for recommendation, handles sparse data well but requires extra library.

Content-Based Filtering

sklearn

import numpy as np

# Movie features (e.g., genre vectors)
movie_features = np.array([[1,0,0], [0,1,0], [0,0,1]])
user_profile = np.array([1,0,0])  # user likes first genre

# Compute similarity
similarities = movie_features.dot(user_profile)
recommendations = np.argsort(-similarities)
print(f"Recommended movies by content: {recommendations}")

Recommends based on movie attributes, not user ratings; simpler but less personalized.

⚡

Complexity: O(n^2 * m) time, O(n * m) space

Time Complexity

Finding neighbors compares each user to others, leading to O(n^2) where n is users, multiplied by m movies for distance calculation.

Space Complexity

Storing the user-movie matrix requires O(n * m) space, where n is users and m is movies.

Which Approach is Fastest?

Using sklearn's NearestNeighbors is efficient for small datasets; specialized libraries like Surprise optimize sparse data better.

Approach	Time	Space	Best For
sklearn NearestNeighbors	O(n^2 * m)	O(n * m)	Small to medium datasets
Surprise Library Collaborative Filtering	Optimized for sparse data	Sparse storage	Large sparse datasets
Content-Based Filtering	O(m)	O(m)	When user ratings are unavailable

💡

Normalize ratings and choose a similarity metric like cosine for better recommendations.

⚠️

Not filtering out movies the user already rated leads to recommending seen movies again.