Recall & Review

beginner

What is the main purpose of the encoder in an encoder-decoder model?

The encoder reads the input data and converts it into a fixed-size representation (a context vector) that summarizes the input information for the decoder to use.

Click to reveal answer

beginner

Why do we use attention in encoder-decoder models?

Attention helps the decoder focus on different parts of the input sequence at each step, instead of relying on a single fixed context vector. This improves performance, especially for long sequences.

Click to reveal answer

intermediate

Describe how the attention mechanism works in simple terms.

At each step, the decoder looks at all encoder outputs and assigns weights (attention scores) to them. These weights show how important each input part is for generating the current output word.

Click to reveal answer

intermediate

What is the difference between the context vector in a basic encoder-decoder and one with attention?

In a basic model, the context vector is fixed and the same for all output steps. With attention, the context vector changes at each step, computed as a weighted sum of encoder outputs based on attention scores.

Click to reveal answer

intermediate

How does attention improve translation quality in machine translation tasks?

Attention allows the model to align output words with relevant input words dynamically, helping it handle long sentences and complex structures better than fixed context models.

Click to reveal answer

What does the encoder output in an encoder-decoder model with attention?

AThe final output sentence

BA single word prediction

CA sequence of hidden states representing the input

DThe loss value

In attention, what do the attention weights represent?

AThe length of the input sequence

BThe importance of each input token for the current output token

CThe number of layers in the model

DThe learning rate

Why is a fixed context vector limiting in basic encoder-decoder models?

AIt cannot capture all input details for long sequences

BIt increases training speed

CIt reduces model size

DIt improves output diversity

Which part of the model uses attention scores to generate output?

ADecoder

BEncoder

CInput layer

DLoss function

What is a common benefit of adding attention to encoder-decoder models?

ALess training data needed

BFaster inference without accuracy change

CSimpler model architecture

DBetter handling of long input sequences

Explain how the attention mechanism changes the way the decoder generates each output token compared to a basic encoder-decoder model.

Describe the roles of the encoder, decoder, and attention mechanism in an encoder-decoder model with attention.

Practice

(1/5)

1. What is the main purpose of the attention mechanism in an encoder-decoder model?

easy

A. To randomly select input tokens for the decoder

B. To help the model focus on relevant parts of the input sequence when generating each output token

C. To speed up the training by skipping some input tokens

D. To reduce the size of the input data before encoding

Encoder-decoder with attention in NLP - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of attention in sequence models

Step 2: Identify the correct purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall attention weight calculation

Step 2: Match the correct formula

Final Answer:

Quick Check:

Solution

Step 1: Analyze tensor shapes in batch matrix multiplication

Step 2: Remove last dimension and apply softmax

Final Answer:

Quick Check:

Solution

Step 1: Understand uniform attention weights meaning

Step 2: Identify missing softmax effect

Final Answer:

Quick Check:

Solution

Step 1: Identify challenges with long sentences

Step 2: Understand multi-head attention benefits

Final Answer:

Quick Check: