Recall & Review
beginner
What is self-attention in simple terms?
Self-attention is a way for a model to look at all parts of a sentence at once and decide which words are important to understand each word better.
Click to reveal answer
intermediate
Why do we use multi-head attention instead of just one attention?
Multi-head attention lets the model look at the sentence from different views or angles at the same time, helping it understand more details and relationships.
Click to reveal answer
intermediate
In self-attention, what are queries, keys, and values?
Queries, keys, and values are three sets of numbers made from the input words. The model compares queries with keys to find important words, then uses values to get the final information.
Click to reveal answer
beginner
How does self-attention help in understanding the meaning of a word in a sentence?
Self-attention helps by giving more focus to words that matter for understanding a word’s meaning, like paying attention to related words nearby or far away in the sentence.
Click to reveal answer
intermediate
What is the main benefit of using multi-head attention in models like Transformers?
It allows the model to capture different types of relationships and features in the data simultaneously, making the model smarter and better at tasks like translation or text understanding.
Click to reveal answer
What does self-attention allow a model to do?
✗ Incorrect
Self-attention helps the model consider all words to find which ones are important for understanding each word.
Why is multi-head attention better than single-head attention?
✗ Incorrect
Multi-head attention allows the model to capture different types of information by looking at the input in several ways simultaneously.
In self-attention, what is the role of the 'keys'?
✗ Incorrect
Keys are compared with queries to calculate attention scores that show which words to focus on.
Which of these is NOT a benefit of self-attention?
✗ Incorrect
Self-attention does not reduce input size; it helps understand relationships and allows parallel processing.
What does each 'head' in multi-head attention do?
✗ Incorrect
Each head learns to focus on different aspects or relationships in the input data.
Explain how self-attention works using a simple example of a sentence.
Think about how a word in a sentence can 'look' at other words to understand its meaning better.
You got /4 concepts.
Describe why multi-head attention improves model understanding compared to single-head attention.
Imagine looking at a problem from different angles to get a fuller picture.
You got /4 concepts.