Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Embedding generation in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Embedding Expert
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main purpose of embedding generation in machine learning?

Embedding generation transforms raw data into a format that machine learning models can understand better. What is the main goal of this process?

ATo increase the size of the dataset by duplicating samples
BTo convert data into fixed-size vectors capturing semantic meaning
CTo remove all noise from the data by filtering
DTo convert numerical data into categorical labels
Attempts:
2 left
💡 Hint

Think about how words or images are represented so models can work with them.

Predict Output
intermediate
1:30remaining
What is the output shape of the embedding vector?

Given the following code snippet generating embeddings for 3 sentences using a model that outputs 768-dimensional vectors, what is the shape of the resulting embedding array?

Prompt Engineering / GenAI
sentences = ['Hello world', 'Machine learning is fun', 'AI helps humans']
embeddings = model.encode(sentences)
print(embeddings.shape)
A(768,)
B(768, 3)
C(3, 3)
D(3, 768)
Attempts:
2 left
💡 Hint

Each sentence gets its own vector of length 768.

Model Choice
advanced
2:00remaining
Which model type is best suited for generating contextual word embeddings?

You want to generate embeddings that capture the meaning of words depending on their sentence context. Which model type should you choose?

ATransformer-based model like BERT
BRecurrent Neural Network (RNN) without attention
CSimple Bag-of-Words model
DK-Nearest Neighbors (KNN) classifier
Attempts:
2 left
💡 Hint

Think about models that understand word order and context deeply.

Metrics
advanced
1:30remaining
Which metric is most appropriate to evaluate similarity between two embedding vectors?

You have two embedding vectors representing sentences. Which metric best measures how similar their meanings are?

AMean squared error
BEuclidean distance
CCosine similarity
DAccuracy
Attempts:
2 left
💡 Hint

Consider a metric that measures the angle between vectors rather than their length.

🔧 Debug
expert
2:00remaining
Why does this embedding generation code raise a TypeError?

Examine the code below that attempts to generate embeddings for a list of texts. Why does it raise a TypeError?

Prompt Engineering / GenAI
texts = ['data science', 'deep learning']
embeddings = model.encode(texts[0], texts[1])
AThe encode method expects a single list argument, not multiple string arguments
BThe model variable is not defined
CThe encode method requires integer inputs, not strings
DThe texts list is empty
Attempts:
2 left
💡 Hint

Check how the encode method is called and what arguments it expects.

Practice

(1/5)
1. What is the main purpose of embedding generation in AI?
easy
A. To convert text or items into number vectors for easier comparison
B. To translate text from one language to another
C. To generate random numbers for encryption
D. To create images from text descriptions

Solution

  1. Step 1: Understand embedding generation

    Embedding generation transforms text or items into number vectors that computers can process.
  2. Step 2: Identify the main purpose

    This transformation helps in comparing meanings and finding similarities between data.
  3. Final Answer:

    To convert text or items into number vectors for easier comparison -> Option A
  4. Quick Check:

    Embedding = number vectors [OK]
Hint: Embeddings turn words into numbers for comparison [OK]
Common Mistakes:
  • Confusing embeddings with translation
  • Thinking embeddings generate images
  • Believing embeddings create random numbers
2. Which of the following is the correct way to represent an embedding vector in Python?
easy
A. embedding = {0.1, 0.5, 0.3, 0.9}
B. embedding = '0.1, 0.5, 0.3, 0.9'
C. embedding = [0.1, 0.5, 0.3, 0.9]
D. embedding = (0.1 0.5 0.3 0.9)

Solution

  1. Step 1: Identify valid Python data structures for vectors

    Embedding vectors are usually lists or arrays of numbers in Python.
  2. Step 2: Check each option

    embedding = [0.1, 0.5, 0.3, 0.9] uses a list with commas, which is correct. embedding = '0.1, 0.5, 0.3, 0.9' is a string, C is a set (unordered), and D has invalid syntax.
  3. Final Answer:

    embedding = [0.1, 0.5, 0.3, 0.9] -> Option C
  4. Quick Check:

    Embedding vector = list of numbers [OK]
Hint: Embedding vectors are lists of numbers in Python [OK]
Common Mistakes:
  • Using strings instead of lists
  • Using sets which are unordered
  • Incorrect tuple syntax without commas
3. Given the following code snippet, what will be the output?
import numpy as np
text_embedding = np.array([0.2, 0.4, 0.6])
query_embedding = np.array([0.1, 0.3, 0.5])
similarity = np.dot(text_embedding, query_embedding)
print(round(similarity, 2))
medium
A. 0.44
B. 0.28
C. 0.36
D. 0.52

Solution

  1. Step 1: Calculate the dot product of the two vectors

    Dot product = (0.2*0.1) + (0.4*0.3) + (0.6*0.5) = 0.02 + 0.12 + 0.30 = 0.44
  2. Step 2: Round the result to 2 decimal places

    Rounded value = 0.44
  3. Final Answer:

    0.44 -> Option A
  4. Quick Check:

    Dot product = 0.44 [OK]
Hint: Dot product sums element-wise products [OK]
Common Mistakes:
  • Multiplying vectors element-wise without summing
  • Rounding before summing
  • Confusing dot product with vector length
4. The following code is intended to compute cosine similarity between two embeddings but has an error. What is the error?
import numpy as np
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

vec1 = np.array([1, 0, 0])
vec2 = np.array([0, 1, 0])
print(cosine_similarity(vec1, vec2))
medium
A. Division by zero error when vectors are zero
B. No error; code works correctly
C. Using lists instead of numpy arrays
D. Incorrect use of np.dot instead of np.cross

Solution

  1. Step 1: Analyze the cosine similarity function

    The function correctly computes dot product divided by product of norms.
  2. Step 2: Check the example vectors and output

    Vectors are numpy arrays and non-zero, so no division by zero occurs. The code runs correctly and prints 0.0.
  3. Final Answer:

    No error; code works correctly -> Option B
  4. Quick Check:

    Cosine similarity code = correct [OK]
Hint: Check for zero vectors to avoid division errors [OK]
Common Mistakes:
  • Confusing dot product with cross product
  • Forgetting to use numpy arrays
  • Not handling zero vectors causing division errors
5. You have a list of product descriptions and want to group similar products using embeddings. Which approach best helps you achieve this?
hard
A. Manually read and group descriptions without embeddings
B. Translate descriptions to another language before clustering
C. Use embeddings only for images, not text
D. Generate embeddings for each description, then use clustering on these vectors

Solution

  1. Step 1: Understand the goal of grouping similar products

    Grouping similar products means finding which descriptions are close in meaning.
  2. Step 2: Use embeddings and clustering

    Generating embeddings converts descriptions into vectors. Clustering groups vectors close in space, thus grouping similar products.
  3. Final Answer:

    Generate embeddings for each description, then use clustering on these vectors -> Option D
  4. Quick Check:

    Embedding + clustering = grouping similar items [OK]
Hint: Cluster embedding vectors to group similar items [OK]
Common Mistakes:
  • Thinking translation helps grouping
  • Assuming embeddings only work for images
  • Ignoring embeddings and grouping manually