Recall & Review
beginner
What is the main purpose of a Transformer encoder in machine learning?
A Transformer encoder processes input data by capturing relationships between all parts of the input simultaneously, helping models understand context and meaning without relying on sequence order alone.
Click to reveal answer
beginner
What is 'self-attention' in the context of a Transformer encoder?
Self-attention is a mechanism where the model looks at all parts of the input to decide which parts are important to focus on when encoding each word or token.
Click to reveal answer
intermediate
Name the two main sub-layers inside a Transformer encoder block.
The two main sub-layers are: 1) Multi-head self-attention layer, 2) Position-wise feed-forward neural network.
Click to reveal answer
intermediate
Why do Transformer encoders use 'positional encoding'?
Because Transformers process all input tokens at once, positional encoding adds information about the order of tokens so the model knows the sequence position of each token.
Click to reveal answer
beginner
In PyTorch, which class can be used to create a Transformer encoder layer?You can use torch.nn.TransformerEncoderLayer to create a single encoder layer, and torch.nn.TransformerEncoder to stack multiple layers.
Click to reveal answer
What does the self-attention mechanism in a Transformer encoder help the model do?
✗ Incorrect
Self-attention helps the model focus on important parts of the input sequence by weighing the relevance of each token to others.
Which component adds information about token order in a Transformer encoder?
✗ Incorrect
Positional encoding adds information about the position of tokens since the Transformer processes tokens in parallel.
What is the role of the feed-forward network inside a Transformer encoder layer?
✗ Incorrect
The feed-forward network applies a small neural network to each token's representation independently to add complexity and non-linearity.
In PyTorch, which class stacks multiple Transformer encoder layers?
✗ Incorrect
torch.nn.TransformerEncoder stacks multiple TransformerEncoderLayer instances to build a full encoder.
Why do Transformer encoders process all tokens simultaneously instead of one by one?
✗ Incorrect
Processing all tokens simultaneously allows the model to learn relationships between all tokens and speeds up training.
Explain how self-attention works inside a Transformer encoder and why it is important.
Think about how the model decides what parts of the input to focus on.
You got /3 concepts.
Describe the main components of a Transformer encoder layer and their roles.
Consider the flow of data through the encoder block.
You got /4 concepts.