Python ML Program to Recommend Movies using sklearn
NearestNeighbors(n_neighbors=2).fit(user_movie_ratings) and predict recommendations with kneighbors().Examples
How to Think About It
Algorithm
Code
from sklearn.neighbors import NearestNeighbors import numpy as np # Sample user-movie ratings matrix (rows: users, cols: movies) ratings = np.array([ [5, 4, 0], [4, 0, 3], [0, 2, 5] ]) model = NearestNeighbors(n_neighbors=2, metric='cosine') model.fit(ratings) user_id = 0 # Find 2 nearest neighbors for user 0 distances, indices = model.kneighbors([ratings[user_id]]) # Movies user 0 hasn't rated user_movies = ratings[user_id] unrated = np.where(user_movies == 0)[0] # Recommend movies rated by neighbors but not by user 0 recommendations = set() for neighbor in indices[0][1:]: # skip user itself neighbor_ratings = ratings[neighbor] for movie in unrated: if neighbor_ratings[movie] >= 3: # threshold for liking recommendations.add(movie) print(f"Recommended movies for user {user_id}: Movie indices {sorted(recommendations)}")
Dry Run
Let's trace user 0's recommendations through the code
Input ratings matrix
ratings = [[5,4,0],[4,0,3],[0,2,5]]
Fit NearestNeighbors model
Model learns user similarity based on cosine distance
Find neighbors for user 0
neighbors indices = [0,1], distances = [0.0, 0.18]
Identify movies user 0 hasn't rated
unrated movies = [2]
Check neighbors' ratings for unrated movies
Neighbor 1 rated movie 2 as 3 (liked), add to recommendations
Output recommendations
Recommended movies for user 0: [2]
| Step | User | Neighbors | Unrated Movies | Recommendations |
|---|---|---|---|---|
| 1 | User 0 | - | - | - |
| 3 | User 0 | [0,1] | - | - |
| 4 | User 0 | [0,1] | [2] | - |
| 5 | User 0 | [1] | [2] | [2] |
Why This Works
Step 1: Modeling user similarity
We use NearestNeighbors with cosine distance to find users with similar movie rating patterns.
Step 2: Finding neighbors
For a target user, we find closest users who have rated movies similarly using kneighbors().
Step 3: Recommending movies
We recommend movies that neighbors liked (rating >= 3) but the target user hasn't rated yet.
Alternative Approaches
from surprise import Dataset, Reader, KNNBasic import pandas as pd # Prepare data ratings_dict = {'userID': ['A', 'A', 'B', 'B', 'C'], 'itemID': ['M1', 'M2', 'M2', 'M3', 'M1'], 'rating': [5, 3, 4, 2, 4]} reader = Reader(rating_scale=(1, 5)) data = Dataset.load_from_df(pd.DataFrame(ratings_dict), reader) trainset = data.build_full_trainset() # Use KNNBasic algorithm algo = KNNBasic(sim_options={'name': 'cosine', 'user_based': True}) algo.fit(trainset) # Predict rating for user 'A' on movie 'M3' pred = algo.predict('A', 'M3') print(f"Predicted rating: {pred.est}")
import numpy as np # Movie features (e.g., genre vectors) movie_features = np.array([[1,0,0], [0,1,0], [0,0,1]]) user_profile = np.array([1,0,0]) # user likes first genre # Compute similarity similarities = movie_features.dot(user_profile) recommendations = np.argsort(-similarities) print(f"Recommended movies by content: {recommendations}")
Complexity: O(n^2 * m) time, O(n * m) space
Time Complexity
Finding neighbors compares each user to others, leading to O(n^2) where n is users, multiplied by m movies for distance calculation.
Space Complexity
Storing the user-movie matrix requires O(n * m) space, where n is users and m is movies.
Which Approach is Fastest?
Using sklearn's NearestNeighbors is efficient for small datasets; specialized libraries like Surprise optimize sparse data better.
| Approach | Time | Space | Best For |
|---|---|---|---|
| sklearn NearestNeighbors | O(n^2 * m) | O(n * m) | Small to medium datasets |
| Surprise Library Collaborative Filtering | Optimized for sparse data | Sparse storage | Large sparse datasets |
| Content-Based Filtering | O(m) | O(m) | When user ratings are unavailable |