Introduction
Cosine similarity helps us measure how alike two things are by looking at the angle between their features, ignoring their size.
Jump into concepts and practice - no test required
cosine_similarity = (A · B) / (||A|| * ||B||)
Where:
- A and B are vectors
- · means dot product
- ||A|| means length (magnitude) of vector AA = [1, 0, 1] B = [0, 1, 1] cosine_similarity = (1*0 + 0*1 + 1*1) / (sqrt(1**2+0**2+1**2) * sqrt(0**2+1**2+1**2)) = 1 / (sqrt(2)*sqrt(2)) = 0.5
A = [2, 3] B = [4, 6] cosine_similarity = (2*4 + 3*6) / (sqrt(2**2+3**2) * sqrt(4**2+6**2)) = 26 / (sqrt(13)*sqrt(52)) = 1.0
from sklearn.metrics.pairwise import cosine_similarity import numpy as np # Two example text vectors (e.g., TF-IDF or embeddings) vector1 = np.array([[1, 2, 3, 4]]) vector2 = np.array([[4, 3, 2, 1]]) # Calculate cosine similarity similarity = cosine_similarity(vector1, vector2) print(f"Cosine similarity: {similarity[0][0]:.4f}")
A and B?A = [1, 2, 3] and B = [4, 5, 6], what is the cosine similarity (rounded to 2 decimals)?import numpy as np
def cosine_sim(a, b):
return np.dot(a, b) / np.linalg.norm(a + b)
A = np.array([1, 0])
B = np.array([0, 1])
print(cosine_sim(A, B))doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?