Recall & Review

beginner

What is self-attention in simple terms?

Self-attention is a way for a model to look at all parts of a sentence at once and decide which words are important to understand each word better.

Click to reveal answer

intermediate

Why do we use multi-head attention instead of just one attention?

Multi-head attention lets the model look at the sentence from different views or angles at the same time, helping it understand more details and relationships.

Click to reveal answer

intermediate

In self-attention, what are queries, keys, and values?

Queries, keys, and values are three sets of numbers made from the input words. The model compares queries with keys to find important words, then uses values to get the final information.

Click to reveal answer

beginner

How does self-attention help in understanding the meaning of a word in a sentence?

Self-attention helps by giving more focus to words that matter for understanding a word’s meaning, like paying attention to related words nearby or far away in the sentence.

Click to reveal answer

intermediate

What is the main benefit of using multi-head attention in models like Transformers?

It allows the model to capture different types of relationships and features in the data simultaneously, making the model smarter and better at tasks like translation or text understanding.

Click to reveal answer

What does self-attention allow a model to do?

AIgnore the order of words completely

BLook at all words in a sentence to find important ones

COnly focus on the first word in a sentence

DTranslate sentences without any context

Why is multi-head attention better than single-head attention?

AIt looks at the input from multiple perspectives at once

BIt uses less memory

CIt ignores irrelevant words

DIt only focuses on one word at a time

In self-attention, what is the role of the 'keys'?

AThey are ignored during attention

BThey store the final output

CThey represent the sentence length

DThey are compared with queries to find important words

Which of these is NOT a benefit of self-attention?

ACapturing relationships between distant words

BUnderstanding word importance in context

CReducing the size of the input data

DAllowing parallel processing of words

What does each 'head' in multi-head attention do?

AFocuses on different parts or features of the input

BProcesses the entire sentence identically

CRemoves irrelevant words

DGenerates random outputs

Explain how self-attention works using a simple example of a sentence.

Describe why multi-head attention improves model understanding compared to single-head attention.

Practice

(1/5)

1. What is the main purpose of self-attention in natural language processing?

easy

A. To reduce the size of the input data by removing words

B. To generate random sentences without context

C. To translate text from one language to another

D. To let the model focus on important words by comparing all words to each other

Self-attention and multi-head attention in NLP - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand self-attention's role

Step 2: Match purpose with options

Final Answer:

Quick Check:

Solution

Step 1: Recall multi-head attention definition

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Extract the second row scores

Step 2: Apply softmax to these scores

Final Answer:

Quick Check:

Solution

Step 1: Analyze softmax calculation

Step 2: Check output aggregation

Final Answer:

Quick Check:

Solution

Step 1: Understand effect of increasing attention heads

Step 2: Consider computational cost and accuracy

Final Answer:

Quick Check: