0
0
NLPml~5 mins

Self-attention and multi-head attention in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is self-attention in simple terms?
Self-attention is a way for a model to look at all parts of a sentence at once and decide which words are important to understand each word better.
Click to reveal answer
intermediate
Why do we use multi-head attention instead of just one attention?
Multi-head attention lets the model look at the sentence from different views or angles at the same time, helping it understand more details and relationships.
Click to reveal answer
intermediate
In self-attention, what are queries, keys, and values?
Queries, keys, and values are three sets of numbers made from the input words. The model compares queries with keys to find important words, then uses values to get the final information.
Click to reveal answer
beginner
How does self-attention help in understanding the meaning of a word in a sentence?
Self-attention helps by giving more focus to words that matter for understanding a word’s meaning, like paying attention to related words nearby or far away in the sentence.
Click to reveal answer
intermediate
What is the main benefit of using multi-head attention in models like Transformers?
It allows the model to capture different types of relationships and features in the data simultaneously, making the model smarter and better at tasks like translation or text understanding.
Click to reveal answer
What does self-attention allow a model to do?
AIgnore the order of words completely
BLook at all words in a sentence to find important ones
COnly focus on the first word in a sentence
DTranslate sentences without any context
Why is multi-head attention better than single-head attention?
AIt looks at the input from multiple perspectives at once
BIt uses less memory
CIt ignores irrelevant words
DIt only focuses on one word at a time
In self-attention, what is the role of the 'keys'?
AThey are ignored during attention
BThey store the final output
CThey represent the sentence length
DThey are compared with queries to find important words
Which of these is NOT a benefit of self-attention?
ACapturing relationships between distant words
BUnderstanding word importance in context
CReducing the size of the input data
DAllowing parallel processing of words
What does each 'head' in multi-head attention do?
AFocuses on different parts or features of the input
BProcesses the entire sentence identically
CRemoves irrelevant words
DGenerates random outputs
Explain how self-attention works using a simple example of a sentence.
Think about how a word in a sentence can 'look' at other words to understand its meaning better.
You got /4 concepts.
    Describe why multi-head attention improves model understanding compared to single-head attention.
    Imagine looking at a problem from different angles to get a fuller picture.
    You got /4 concepts.