Which statement best describes what cosine similarity measures between two vectors?
Think about how cosine similarity relates to the angle formed by vectors.
Cosine similarity measures the cosine of the angle between two vectors, focusing on their direction rather than magnitude.
What is the output of the following Python code that calculates Euclidean distance between two vectors?
import numpy as np v1 = np.array([1, 2, 3]) v2 = np.array([4, 6, 3]) distance = np.linalg.norm(v1 - v2) print(round(distance, 2))
Calculate the square root of the sum of squared differences.
The Euclidean distance is sqrt((1-4)^2 + (2-6)^2 + (3-3)^2) = sqrt(9 + 16 + 0) = sqrt(25) = 5.0. The code rounds to 2 decimals, printing 5.0.
You have very sparse high-dimensional vectors representing text documents. Which similarity metric is generally best suited to compare these vectors?
Think about which metric handles sparse data and presence/absence well.
Jaccard similarity is effective for sparse vectors as it measures the ratio of shared non-zero elements to total unique elements, focusing on overlap.
What error does this code raise when calculating cosine similarity between two vectors?
import numpy as np v1 = np.array([1, 0, 0]) v2 = np.array([0, 0, 0]) cos_sim = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2)) print(cos_sim)
Check what happens when a vector has zero length in the denominator.
Since v2 is a zero vector, its norm is zero, causing a 0/0 indeterminate form. NumPy handles this by returning NaN without raising an error.
You have feature vectors extracted from images using a deep neural network. These vectors are dense and normalized to unit length. Which similarity metric is most appropriate to compare these vectors for image retrieval?
Consider the effect of normalization on distance metrics.
When vectors are normalized to unit length, cosine similarity and Euclidean distance are related, but cosine similarity directly measures angular similarity, which is preferred.