0
0
Prompt Engineering / GenAIml~10 mins

Transformer architecture overview in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create the input embedding layer for a Transformer model.

Prompt Engineering / GenAI
embedding_layer = nn.Embedding(num_tokens, [1])
Drag options to blanks, or click blank then click option'
Aembedding_dim
Bnum_heads
Cnum_layers
Ddropout_rate
Attempts:
3 left
💡 Hint
Common Mistakes
Using number of heads instead of embedding dimension.
Confusing number of layers with embedding size.
2fill in blank
medium

Complete the code to apply multi-head attention in the Transformer encoder block.

Prompt Engineering / GenAI
attention_output, _ = multihead_attn(query, key, value, [1]=key_padding_mask)
Drag options to blanks, or click blank then click option'
Abias
Bdropout
Cattn_mask
Dkey_padding_mask
Attempts:
3 left
💡 Hint
Common Mistakes
Using attn_mask instead of key_padding_mask for padding.
Passing dropout parameter here instead of mask.
3fill in blank
hard

Fix the error in the Transformer feed-forward network layer by completing the missing activation function.

Prompt Engineering / GenAI
ffn_output = linear2([1](linear1(x)))
Drag options to blanks, or click blank then click option'
Asigmoid
Brelu
Csoftmax
Dtanh
Attempts:
3 left
💡 Hint
Common Mistakes
Using softmax which is for probabilities, not activations here.
Using sigmoid or tanh which are less common in Transformer FFN.
4fill in blank
hard

Fill both blanks to create a positional encoding function that adds position info to token embeddings.

Prompt Engineering / GenAI
positional_encoding = torch.zeros(seq_len, [1])
for pos in range(seq_len):
    for i in range(0, [2], 2):
        positional_encoding[pos, i] = math.sin(pos / (10000 ** (i / [2])))
Drag options to blanks, or click blank then click option'
Aembedding_dim
Bseq_len
Cnum_heads
Dbatch_size
Attempts:
3 left
💡 Hint
Common Mistakes
Using sequence length for the second blank which should be embedding dimension.
Confusing number of heads or batch size with embedding dimension.
5fill in blank
hard

Fill all three blanks to complete the Transformer encoder layer with normalization and residual connections.

Prompt Engineering / GenAI
x = x + [1](multihead_attn(x, x, x))
x = [2](x)
residual = x
x = x + [3](feed_forward(x))
Drag options to blanks, or click blank then click option'
Alayer_norm
Bdropout
Drelu
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing up dropout and layer normalization order.
Using activation functions instead of dropout or normalization.