Challenge - 5 Problems
Transformer Encoder Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output shape of Transformer encoder layer
Given the following PyTorch code snippet, what is the shape of the output tensor after passing through the Transformer encoder layer?
PyTorch
import torch import torch.nn as nn batch_size = 4 seq_length = 10 embedding_dim = 32 x = torch.rand(batch_size, seq_length, embedding_dim) encoder_layer = nn.TransformerEncoderLayer(d_model=embedding_dim, nhead=4, batch_first=True) output = encoder_layer(x) print(output.shape)
Attempts:
2 left
💡 Hint
Remember that nn.TransformerEncoderLayer expects input shape (batch_size, seq_length, embedding_dim) and returns the same shape.
✗ Incorrect
The Transformer encoder layer in PyTorch expects input of shape (batch_size, seq_length, embedding_dim) and outputs the same shape. So the output shape matches the input shape.
❓ Model Choice
intermediate1:30remaining
Choosing the number of attention heads
You want to create a Transformer encoder layer with embedding dimension 64. Which choice of number of attention heads is valid?
Attempts:
2 left
💡 Hint
The embedding dimension must be divisible by the number of attention heads.
✗ Incorrect
The number of attention heads must divide the embedding dimension evenly. 64 divided by 8 is 8, which is an integer. Other options do not divide 64 evenly.
❓ Hyperparameter
advanced1:30remaining
Effect of increasing dropout in Transformer encoder
What is the most likely effect of increasing the dropout rate in a Transformer encoder layer during training?
Attempts:
2 left
💡 Hint
Dropout randomly disables parts of the network during training.
✗ Incorrect
Increasing dropout helps prevent overfitting by forcing the model to not rely on any single neuron, which can improve generalization.
🔧 Debug
advanced2:00remaining
Identifying error in Transformer encoder input shape
What error will this code raise when running the Transformer encoder layer?
PyTorch
import torch import torch.nn as nn x = torch.rand(10, 4, 32) # shape (seq_length, batch_size, embedding_dim) encoder_layer = nn.TransformerEncoderLayer(d_model=32, nhead=4, batch_first=True) output = encoder_layer(x) print(output.shape)
Attempts:
2 left
💡 Hint
Check the expected input shape for nn.TransformerEncoderLayer in PyTorch.
✗ Incorrect
PyTorch's nn.TransformerEncoderLayer expects input shape (batch_size, seq_length, embedding_dim). The code provides (seq_length, batch_size, embedding_dim), causing a runtime error.
🧠 Conceptual
expert2:30remaining
Why use multi-head attention in Transformer encoder?
What is the main advantage of using multi-head attention instead of a single attention head in a Transformer encoder?
Attempts:
2 left
💡 Hint
Think about how multiple attention heads help the model understand different aspects of the input.
✗ Incorrect
Multi-head attention lets the model look at the input from multiple perspectives simultaneously, capturing diverse features and relationships.