Practice

(1/5)

1. What does the dimensionality of an embedding vector mainly control in AI models?

easy

A. The color of the data points in visualization

B. The speed of the computer's processor

C. The level of detail or information captured about the item

D. The number of training examples needed

Solution

Step 1: Understand embedding vectors
Embedding vectors represent items as numbers. Their length (dimensionality) decides how much detail they can hold.
Step 2: Relate dimensionality to information
Higher dimensions mean more features can be captured, so more detail is stored about the item.
Final Answer:
The level of detail or information captured about the item -> Option C
Quick Check:
Embedding dimensionality = detail level [OK]

Hint: Embedding size = how detailed the vector is [OK]

Common Mistakes:

Confusing dimensionality with training speed
Thinking dimensionality affects data color
Assuming dimensionality controls dataset size

2. Which of the following is the correct way to define an embedding layer with 50 dimensions in Python using PyTorch?

easy

A. nn.Embedding(dim=50, size=1000)

B. nn.Embedding(50, 1000)

C. nn.Embedding(embedding_size=50)

D. nn.Embedding(num_embeddings=1000, embedding_dim=50)

Solution

Step 1: Recall PyTorch embedding syntax
PyTorch's embedding layer uses nn.Embedding(num_embeddings, embedding_dim).
Step 2: Match parameters to question
We want 50 dimensions, so embedding_dim=50. Number of embeddings is usually vocabulary size, e.g., 1000.
Final Answer:
nn.Embedding(num_embeddings=1000, embedding_dim=50) -> Option D
Quick Check:
PyTorch embedding syntax = nn.Embedding(num_embeddings, embedding_dim) [OK]

Hint: Remember nn.Embedding(num_embeddings, embedding_dim) order [OK]

Common Mistakes:

Swapping num_embeddings and embedding_dim
Using wrong parameter names like dim or size
Omitting required parameters

3. Consider this code snippet using TensorFlow to create embeddings:

embedding_layer = tf.keras.layers.Embedding(input_dim=5000, output_dim=16)
input_data = tf.constant([1, 2, 3])
output = embedding_layer(input_data)
print(output.shape)

What will be the printed shape?

medium

A. (3, 16)

B. (16, 3)

C. (3, 5000)

D. (5000, 16)

Solution

Step 1: Understand input and output dimensions
Input is a list of 3 indices. Each index maps to a 16-dimensional vector.
Step 2: Determine output shape
Output shape is (number of inputs, embedding dimension) = (3, 16).
Final Answer:
(3, 16) -> Option A
Quick Check:
Output shape = (input length, embedding dim) [OK]

Hint: Output shape = input count x embedding size [OK]

Common Mistakes:

Confusing embedding dimension with input dimension
Swapping rows and columns in output shape
Assuming output shape equals input_dim

4. You have an embedding layer defined as nn.Embedding(1000, 128) in PyTorch. You try to pass an input tensor with values outside the range 0-999. What error will most likely occur?

medium

A. TypeError because input is not a float

B. IndexError due to out-of-range indices

C. ValueError because embedding dimension is wrong

D. No error, embeddings handle any input values

Solution

Step 1: Understand embedding input constraints
Embedding layers expect input indices between 0 and num_embeddings-1 (0 to 999 here).
Step 2: Identify error from invalid indices
Passing indices outside this range causes an IndexError because the layer cannot find embeddings for invalid indices.
Final Answer:
IndexError due to out-of-range indices -> Option B
Quick Check:
Embedding input indices must be valid [OK]

Hint: Embedding inputs must be valid indices [OK]

Common Mistakes:

Thinking embeddings accept any numeric input
Confusing input type errors with index errors
Assuming embedding dimension affects input range

5. You want to choose the embedding dimensionality for a text classification model. The vocabulary size is 10,000 words. Which embedding size is the best balance between capturing enough detail and keeping the model efficient?

hard

A. 128 dimensions

B. 5000 dimensions

C. 10000 dimensions

D. 16 dimensions

Solution

Step 1: Consider vocabulary size and embedding size trade-off
Very small embeddings (like 16) may miss details; very large (like 5000 or 10000) are costly and may overfit.
Step 2: Choose a moderate embedding size
128 dimensions is a common practical choice balancing detail and efficiency for 10,000 words.
Final Answer:
128 dimensions -> Option A
Quick Check:
Moderate embedding size balances detail and efficiency [OK]

Hint: Pick moderate size like 128 for balance [OK]

Common Mistakes:

Choosing too small embedding loses info
Choosing too large wastes resources
Matching embedding size to vocabulary size exactly

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.85	0.55	Starting training with high loss and low accuracy
2	0.65	0.70	Loss decreases, accuracy improves
3	0.50	0.78	Model learns meaningful patterns
4	0.40	0.83	Continued improvement
5	0.35	0.86	Training converges well

Embedding dimensionality considerations in Prompt Engineering / GenAI - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand embedding vectors

Step 2: Relate dimensionality to information

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch embedding syntax

Step 2: Match parameters to question

Final Answer:

Quick Check:

Solution

Step 1: Understand input and output dimensions

Step 2: Determine output shape

Final Answer:

Quick Check:

Solution

Step 1: Understand embedding input constraints

Step 2: Identify error from invalid indices

Final Answer:

Quick Check:

Solution

Step 1: Consider vocabulary size and embedding size trade-off

Step 2: Choose a moderate embedding size

Final Answer:

Quick Check: