0
0
PyTorchml~5 mins

Transformer encoder in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main purpose of a Transformer encoder in machine learning?
A Transformer encoder processes input data by capturing relationships between all parts of the input simultaneously, helping models understand context and meaning without relying on sequence order alone.
Click to reveal answer
beginner
What is 'self-attention' in the context of a Transformer encoder?
Self-attention is a mechanism where the model looks at all parts of the input to decide which parts are important to focus on when encoding each word or token.
Click to reveal answer
intermediate
Name the two main sub-layers inside a Transformer encoder block.
The two main sub-layers are: 1) Multi-head self-attention layer, 2) Position-wise feed-forward neural network.
Click to reveal answer
intermediate
Why do Transformer encoders use 'positional encoding'?
Because Transformers process all input tokens at once, positional encoding adds information about the order of tokens so the model knows the sequence position of each token.
Click to reveal answer
beginner
In PyTorch, which class can be used to create a Transformer encoder layer?
You can use torch.nn.TransformerEncoderLayer to create a single encoder layer, and torch.nn.TransformerEncoder to stack multiple layers.
Click to reveal answer
What does the self-attention mechanism in a Transformer encoder help the model do?
AReduce the size of the input data
BFocus on important parts of the input sequence
CSort the input tokens alphabetically
DGenerate output tokens directly
Which component adds information about token order in a Transformer encoder?
APositional encoding
BLayer normalization
CFeed-forward network
DMulti-head attention
What is the role of the feed-forward network inside a Transformer encoder layer?
ATo apply a simple neural network to each position independently
BTo add positional information
CTo combine outputs from multiple attention heads
DTo normalize the input data
In PyTorch, which class stacks multiple Transformer encoder layers?
Atorch.nn.TransformerEncoderLayer
Btorch.nn.MultiheadAttention
Ctorch.nn.TransformerEncoder
Dtorch.nn.Linear
Why do Transformer encoders process all tokens simultaneously instead of one by one?
ABecause they cannot handle sequences
BTo generate output faster
CTo sort tokens before processing
DTo reduce training time and capture global context
Explain how self-attention works inside a Transformer encoder and why it is important.
Think about how the model decides what parts of the input to focus on.
You got /3 concepts.
    Describe the main components of a Transformer encoder layer and their roles.
    Consider the flow of data through the encoder block.
    You got /4 concepts.