Recall & Review
beginner
What is the main purpose of a Transformer model in machine learning?
A Transformer model is designed to process sequences of data, like sentences, by paying attention to all parts of the sequence at once. This helps it understand context better than older models.
Click to reveal answer
beginner
What does 'self-attention' mean in Transformer models?
Self-attention lets the model look at every word in a sentence and decide which words are important to understand each word better. It helps the model focus on relevant parts of the input.
Click to reveal answer
intermediate
In Simulink, how can you represent the multi-head attention mechanism of a Transformer?
You can use parallel blocks to represent multiple attention heads, each processing the input differently, then combine their outputs to capture diverse information.
Click to reveal answer
beginner
Why do Transformer models use positional encoding?
Because Transformers look at all words at once, they need a way to know the order of words. Positional encoding adds information about word positions so the model understands sequence order.
Click to reveal answer
intermediate
What is the role of the feed-forward network in a Transformer block?
After attention, the feed-forward network processes each position separately to add more complex transformations and help the model learn better features.
Click to reveal answer
What does the 'attention' mechanism in Transformers help the model do?
✗ Incorrect
Attention helps the model focus on relevant parts of the input to understand context better.
Why is positional encoding necessary in Transformer models?
✗ Incorrect
Transformers process all words at once, so positional encoding tells the model the order of words.
In Simulink, how can multi-head attention be modeled?
✗ Incorrect
Parallel blocks represent multiple attention heads working simultaneously.
What is the output of the feed-forward network in a Transformer block?
✗ Incorrect
The feed-forward network applies transformations to each position's data separately.
Which of these is NOT a component of a Transformer block?
✗ Incorrect
Transformers do not use convolutional layers; they rely on attention and feed-forward networks.
Explain how self-attention works in a Transformer model and why it is important.
Think about how the model looks at all words to decide which ones matter.
You got /3 concepts.
Describe how you would model a Transformer block in Simulink, including key components.
Consider how to represent attention heads and data flow in Simulink.
You got /4 concepts.