0
0
ML Pythonml~5 mins

Matrix factorization basics in ML Python

Choose your learning style9 modes available
Introduction

Matrix factorization helps us break a big table of numbers into smaller, simpler tables. This makes it easier to find hidden patterns and make predictions.

When you want to recommend movies to users based on their past ratings.
When you need to fill in missing values in a large dataset.
When you want to find groups or clusters in data like customer preferences.
When you want to reduce the size of data while keeping important information.
When you want to understand relationships between items and users in a simple way.
Syntax
ML Python
Given a matrix R (m x n), find two matrices P (m x k) and Q (k x n) such that:
R ≈ P x Q
where k is the number of latent features (smaller than m and n).

The number k controls how much detail you keep.

Matrix multiplication of P and Q reconstructs an approximation of R.

Examples
This example shows a user-item rating matrix with missing values (0). We want to find two smaller matrices to approximate it.
ML Python
R = [[5, 3, 0],
     [4, 0, 0],
     [1, 1, 0],
     [0, 0, 5],
     [0, 0, 4]]

Find P (5 x 2) and Q (2 x 3) such that R ≈ P x Q
These are example factor matrices. Multiplying P and Q gives an approximation of the original matrix R.
ML Python
P = [[1.2, 0.5],
     [1.0, 0.3],
     [0.5, 0.7],
     [0.0, 1.5],
     [0.1, 1.3]]

Q = [[4.0, 2.5, 0.1],
     [0.2, 0.3, 4.5]]
Sample Model

This code shows a simple matrix factorization using gradient descent to approximate a user-item rating matrix. It updates two smaller matrices to reconstruct the original matrix.

ML Python
import numpy as np

# Original matrix with missing values as zeros
R = np.array([
    [5, 3, 0],
    [4, 0, 0],
    [1, 1, 0],
    [0, 0, 5],
    [0, 0, 4]
], dtype=float)

# Number of latent features
k = 2

# Initialize P and Q with random values
np.random.seed(42)
P = np.random.rand(R.shape[0], k)
Q = np.random.rand(k, R.shape[1])

# Set learning rate and iterations
alpha = 0.01
iterations = 5000

# Matrix factorization using gradient descent
for _ in range(iterations):
    for i in range(R.shape[0]):
        for j in range(R.shape[1]):
            if R[i, j] > 0:  # Only update for known ratings
                eij = R[i, j] - np.dot(P[i, :], Q[:, j])
                for r in range(k):
                    P[i, r] += alpha * 2 * eij * Q[r, j]
                    Q[r, j] += alpha * 2 * eij * P[i, r]

# Reconstruct the matrix
R_hat = np.dot(P, Q)

# Print original and reconstructed matrices
print("Original matrix R:")
print(R)
print("\nReconstructed matrix R_hat (approximation):")
print(np.round(R_hat, 2))
OutputSuccess
Important Notes

Matrix factorization works best when the matrix is mostly filled with known values.

Choosing the right number of latent features (k) is important: too small loses detail, too large may overfit.

Gradient descent updates P and Q to reduce the difference between the original and reconstructed matrix.

Summary

Matrix factorization breaks a big matrix into smaller ones to find hidden patterns.

It helps predict missing values and understand relationships in data.

Simple algorithms like gradient descent can find these smaller matrices step by step.