0
0
Prompt Engineering / GenAIml~5 mins

Transformer architecture overview in Prompt Engineering / GenAI - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main purpose of the Transformer architecture in AI?
The Transformer architecture is designed to process sequences of data, like sentences, by focusing on relationships between all parts of the sequence at once, enabling better understanding and generation of language.
Click to reveal answer
beginner
What does 'self-attention' mean in the Transformer model?
Self-attention is a mechanism where the model looks at all words in a sentence to decide which words are important to understand each word better, helping it capture context effectively.
Click to reveal answer
beginner
Name the two main parts of the Transformer architecture.
The Transformer has two main parts: the Encoder, which reads and understands the input data, and the Decoder, which generates the output based on the Encoder's understanding.
Click to reveal answer
intermediate
Why does the Transformer use 'positional encoding'?
Because Transformers process all words at once, positional encoding adds information about the order of words so the model knows the sequence in which words appear.
Click to reveal answer
intermediate
How does the Transformer differ from older sequence models like RNNs?
Unlike RNNs that process words one by one, Transformers look at all words simultaneously using self-attention, which allows faster training and better understanding of long-range relationships.
Click to reveal answer
What is the role of the Encoder in a Transformer?
ATo generate the output sequence
BTo read and understand the input data
CTo add positional information
DTo perform self-attention only on output
What does self-attention help the Transformer model do?
AIgnore word order
BProcess words one at a time
CFocus on important parts of the input sequence
DReduce the size of the input
Why is positional encoding necessary in Transformers?
ATo increase vocabulary size
BTo speed up training
CTo reduce model size
DBecause Transformers do not process data sequentially
Which part of the Transformer generates the final output?
ADecoder
BEncoder
CSelf-attention layer
DPositional encoding
How does the Transformer improve over RNNs?
AProcesses sequences in parallel using self-attention
BProcesses sequences strictly one word at a time
CUses fewer layers
DIgnores context
Explain how self-attention works in the Transformer architecture and why it is important.
Think about how the model decides which words to focus on when reading a sentence.
You got /3 concepts.
    Describe the roles of the Encoder and Decoder in the Transformer model.
    Consider the flow from input to output in a translation task.
    You got /3 concepts.