Bird
Raised Fist0
Prompt Engineering / GenAIml~10 mins

Similarity search and retrieval in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to compute cosine similarity between two vectors.

Prompt Engineering / GenAI
from sklearn.metrics.pairwise import [1]

vector_a = [[1, 2, 3]]
vector_b = [[4, 5, 6]]
similarity = [1](vector_a, vector_b)
print(similarity)
Drag options to blanks, or click blank then click option'
Apairwise_distances
Beuclidean_distances
Cmanhattan_distances
Dcosine_similarity
Attempts:
3 left
💡 Hint
Common Mistakes
Using distance functions instead of similarity functions.
Confusing cosine similarity with Euclidean distance.
2fill in blank
medium

Complete the code to find the index of the most similar vector in a list using cosine similarity.

Prompt Engineering / GenAI
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

vectors = np.array([[1, 0], [0, 1], [1, 1]])
query = np.array([[0.9, 0.1]])
similarities = cosine_similarity(query, vectors)
most_similar_index = np.argmax([1])
print(most_similar_index)
Drag options to blanks, or click blank then click option'
Asimilarities
Bvectors
Cquery
Dnp.array
Attempts:
3 left
💡 Hint
Common Mistakes
Using the original vectors array instead of similarity scores.
Trying to find max on the query vector.
3fill in blank
hard

Fix the error in the code to correctly compute Euclidean distances between vectors.

Prompt Engineering / GenAI
from sklearn.metrics.pairwise import [1]

vectors = [[1, 2], [3, 4], [5, 6]]
distances = [1](vectors)
print(distances)
Drag options to blanks, or click blank then click option'
Acosine_similarity
Beuclidean_distances
Cmanhattan_distances
Dpairwise_kernels
Attempts:
3 left
💡 Hint
Common Mistakes
Using similarity functions instead of distance functions.
Passing a single list instead of a 2D array.
4fill in blank
hard

Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.

Prompt Engineering / GenAI
words = ['apple', 'bat', 'carrot', 'dog', 'elephant']
lengths = {word: [1] for word in words if len(word) [2] 3}
print(lengths)
Drag options to blanks, or click blank then click option'
Alen(word)
B<=
C>
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using the word itself instead of its length.
Using the wrong comparison operator.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary with uppercase keys and values greater than 2.

Prompt Engineering / GenAI
data = {'a': 1, 'b': 3, 'c': 5, 'd': 2}
filtered = [1]: [2] for k, v in data.items() if v [3] 2}
print(filtered)
Drag options to blanks, or click blank then click option'
Ak.upper()
Bv
C>
Dk.lower()
Attempts:
3 left
💡 Hint
Common Mistakes
Not converting keys to uppercase.
Using wrong comparison operator or filtering condition.

Practice

(1/5)
1.

What is the main goal of similarity search in machine learning?

easy
A. To count the number of items in a dataset
B. To sort items alphabetically
C. To find items that are close or alike in a collection
D. To remove duplicate items from a list

Solution

  1. Step 1: Understand the purpose of similarity search

    Similarity search is used to find items that are similar or close to each other in a dataset.
  2. Step 2: Compare options with the definition

    Only To find items that are close or alike in a collection describes finding similar or close items, which matches the goal of similarity search.
  3. Final Answer:

    To find items that are close or alike in a collection -> Option C
  4. Quick Check:

    Similarity search = find similar items [OK]
Hint: Similarity search finds close or alike items [OK]
Common Mistakes:
  • Confusing similarity search with sorting
  • Thinking similarity search counts items
  • Assuming it removes duplicates
2.

Which of the following is the correct way to compute cosine similarity between two vectors A and B in Python using numpy?

import numpy as np
A = np.array([1, 2, 3])
B = np.array([4, 5, 6])
# What code computes cosine similarity?
easy
A. np.dot(A, B) * (np.linalg.norm(A) + np.linalg.norm(B))
B. np.dot(A, B) / (np.linalg.norm(A) - np.linalg.norm(B))
C. np.sum(A * B) / (np.linalg.norm(A) - np.linalg.norm(B))
D. np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B))

Solution

  1. Step 1: Recall cosine similarity formula

    Cosine similarity = dot product of A and B divided by product of their norms.
  2. Step 2: Match formula to code options

    np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B)) matches the formula exactly: np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B)).
  3. Final Answer:

    np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B)) -> Option D
  4. Quick Check:

    Cosine similarity = dot / (norm A * norm B) [OK]
Hint: Cosine similarity = dot product divided by norms product [OK]
Common Mistakes:
  • Adding norms instead of multiplying
  • Subtracting norms in denominator
  • Multiplying dot product by sum of norms
3.

Given the following vectors, what is the cosine similarity between vec1 and vec2?

import numpy as np
vec1 = np.array([1, 0, 0])
vec2 = np.array([0, 1, 0])
cos_sim = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
print("{:.2f}".format(cos_sim))
medium
A. 0.00
B. 0.50
C. -1.00
D. 1.00

Solution

  1. Step 1: Calculate dot product of vec1 and vec2

    Dot product = 1*0 + 0*1 + 0*0 = 0.
  2. Step 2: Calculate norms and cosine similarity

    Norm of vec1 = 1, norm of vec2 = 1, so cosine similarity = 0 / (1*1) = 0.
  3. Final Answer:

    0.00 -> Option A
  4. Quick Check:

    Orthogonal vectors have cosine similarity 0 [OK]
Hint: Orthogonal vectors have cosine similarity zero [OK]
Common Mistakes:
  • Confusing dot product with cosine similarity
  • Forgetting to divide by norms
  • Rounding errors causing wrong answer
4.

Consider this code snippet for similarity search. What is the error?

import numpy as np
vectors = [np.array([1, 2]), np.array([3, 4])]
query = np.array([1, 0])
scores = []
for v in vectors:
    score = np.dot(query, v) / np.linalg.norm(query) * np.linalg.norm(v)
    scores.append(score)
print(scores)
medium
A. Missing parentheses causing wrong order of operations
B. Using np.dot instead of np.cross
C. Vectors have different lengths
D. Query vector is not normalized

Solution

  1. Step 1: Analyze the cosine similarity formula in code

    The formula should divide dot product by product of norms: dot(query, v) / (norm(query) * norm(v)).
  2. Step 2: Identify missing parentheses

    Code does np.dot(query, v) / np.linalg.norm(query) * np.linalg.norm(v), which computes division then multiplication separately, causing wrong result.
  3. Final Answer:

    Missing parentheses causing wrong order of operations -> Option A
  4. Quick Check:

    Use parentheses to group denominator multiplication [OK]
Hint: Use parentheses to group denominator in cosine similarity [OK]
Common Mistakes:
  • Forgetting parentheses around denominator
  • Using cross product instead of dot product
  • Ignoring vector length mismatch
5.

You have a collection of text documents converted into vectors. You want to find the top 2 most similar documents to a new query vector using cosine similarity. Which approach is best?

  1. Compute cosine similarity between query and each document vector.
  2. Sort documents by similarity score descending.
  3. Return top 2 documents.

Which code snippet correctly implements this?

import numpy as np

docs = [np.array([1, 0]), np.array([0, 1]), np.array([1, 1])]
query = np.array([1, 0])

# Choose the correct code:
hard
A. scores = [np.dot(query, d) * np.linalg.norm(query) * np.linalg.norm(d) for d in docs] top2 = sorted(scores)[:2] print(top2)
B. scores = [np.dot(query, d) / (np.linalg.norm(query) * np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2] print(top2)
C. scores = [np.dot(query, d) / (np.linalg.norm(query) - np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i])[:2] print(top2)
D. scores = [np.cross(query, d) / (np.linalg.norm(query) * np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2] print(top2)

Solution

  1. Step 1: Compute cosine similarity correctly

    scores = [np.dot(query, d) / (np.linalg.norm(query) * np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2] print(top2) computes cosine similarity as dot product divided by product of norms, which is correct.
  2. Step 2: Sort indices by similarity descending and select top 2

    scores = [np.dot(query, d) / (np.linalg.norm(query) * np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2] print(top2) sorts indices by scores descending and selects top 2, matching the requirement.
  3. Final Answer:

    scores = [np.dot(query, d) / (np.linalg.norm(query) * np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2] print(top2) -> Option B
  4. Quick Check:

    Cosine similarity + sort descending + top 2 = scores = [np.dot(query, d) / (np.linalg.norm(query) * np.linalg.norm(d)) for d in docs] top2 = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:2] print(top2) [OK]
Hint: Compute cosine similarity, sort descending, pick top results [OK]
Common Mistakes:
  • Multiplying norms instead of dividing
  • Using cross product instead of dot product
  • Sorting ascending instead of descending