Recall & Review
beginner
What is the main purpose of the Transformer architecture in machine learning?
The Transformer architecture is designed to process sequences of data, like sentences, by focusing on relationships between all parts of the sequence at once, enabling better understanding and generation of language.
Click to reveal answer
beginner
What does 'self-attention' mean in the Transformer model?
Self-attention is a mechanism where the model looks at all words in a sentence to decide which words are important to understand each word better, helping it capture context effectively.
Click to reveal answer
intermediate
Name the two main parts of a Transformer encoder layer.
The two main parts are: 1) Multi-head self-attention, which helps the model focus on different parts of the input simultaneously, and 2) Feed-forward neural network, which processes the information further.
Click to reveal answer
intermediate
Why does the Transformer use 'positional encoding'?
Because Transformers do not process data in order like older models, positional encoding adds information about the position of each word in the sequence so the model knows the order of words.
Click to reveal answer
intermediate
How does multi-head attention improve the Transformer’s understanding?
Multi-head attention lets the model look at the input from different perspectives at the same time, capturing various types of relationships between words, which improves understanding.
Click to reveal answer
What problem does the Transformer architecture mainly solve compared to older models like RNNs?
✗ Incorrect
Transformers process all words simultaneously, allowing better context understanding and faster training compared to sequential RNNs.
What is the role of the feed-forward network in a Transformer encoder layer?
✗ Incorrect
The feed-forward network processes the attention output to transform features before passing to the next layer.
Why is positional encoding necessary in Transformers?
✗ Incorrect
Transformers treat input words as a set, so positional encoding adds order information to help understand sequences.
What does 'multi-head' mean in multi-head attention?
✗ Incorrect
Multi-head attention runs several attention processes simultaneously to capture different relationships.
Which part of the Transformer helps it focus on important words in a sentence?
✗ Incorrect
Self-attention lets the model weigh the importance of each word relative to others.
Explain how self-attention works in the Transformer architecture and why it is important.
Think about how the model decides which words to focus on when reading a sentence.
You got /3 concepts.
Describe the role of positional encoding in Transformers and what problem it solves.
Consider why knowing word order is important for understanding sentences.
You got /3 concepts.