0
0
NLPml~5 mins

Encoder-decoder with attention in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main purpose of the encoder in an encoder-decoder model?
The encoder reads the input data and converts it into a fixed-size representation (a context vector) that summarizes the input information for the decoder to use.
Click to reveal answer
beginner
Why do we use attention in encoder-decoder models?
Attention helps the decoder focus on different parts of the input sequence at each step, instead of relying on a single fixed context vector. This improves performance, especially for long sequences.
Click to reveal answer
intermediate
Describe how the attention mechanism works in simple terms.
At each step, the decoder looks at all encoder outputs and assigns weights (attention scores) to them. These weights show how important each input part is for generating the current output word.
Click to reveal answer
intermediate
What is the difference between the context vector in a basic encoder-decoder and one with attention?
In a basic model, the context vector is fixed and the same for all output steps. With attention, the context vector changes at each step, computed as a weighted sum of encoder outputs based on attention scores.
Click to reveal answer
intermediate
How does attention improve translation quality in machine translation tasks?
Attention allows the model to align output words with relevant input words dynamically, helping it handle long sentences and complex structures better than fixed context models.
Click to reveal answer
What does the encoder output in an encoder-decoder model with attention?
AThe final output sentence
BA single word prediction
CA sequence of hidden states representing the input
DThe loss value
In attention, what do the attention weights represent?
AThe length of the input sequence
BThe importance of each input token for the current output token
CThe number of layers in the model
DThe learning rate
Why is a fixed context vector limiting in basic encoder-decoder models?
AIt cannot capture all input details for long sequences
BIt increases training speed
CIt reduces model size
DIt improves output diversity
Which part of the model uses attention scores to generate output?
ADecoder
BEncoder
CInput layer
DLoss function
What is a common benefit of adding attention to encoder-decoder models?
ALess training data needed
BFaster inference without accuracy change
CSimpler model architecture
DBetter handling of long input sequences
Explain how the attention mechanism changes the way the decoder generates each output token compared to a basic encoder-decoder model.
Think about how the decoder decides what input information to use at each step.
You got /4 concepts.
    Describe the roles of the encoder, decoder, and attention mechanism in an encoder-decoder model with attention.
    Consider how these components work together to produce output.
    You got /3 concepts.