Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to compute the attention scores by multiplying queries and keys.

NLP

attention_scores = torch.matmul(queries, [1].transpose(-2, -1))

Drag options to blanks, or click blank then click option'

Aqueries

Bkeys

Cvalues

Dweights

Attempts:

3 left

2fill in blank

medium

Complete the code to scale the attention scores by the square root of the key dimension.

NLP

scaled_scores = attention_scores / math.sqrt([1])

Drag options to blanks, or click blank then click option'

Avalue_dim

Bbatch_size

Ckey_dim

Dquery_dim

Attempts:

3 left

3fill in blank

hard

Fix the error in applying softmax to the attention scores along the correct dimension.

NLP

attention_weights = torch.nn.functional.softmax(attention_scores, dim=[1])

Drag options to blanks, or click blank then click option'

A-1

D-2

Attempts:

3 left

4fill in blank

hard

Fill both blanks to compute multi-head attention output by concatenating heads and applying a linear layer.

NLP

multihead_output = self.linear_out(torch.cat([1], dim=[2]))

Drag options to blanks, or click blank then click option'

Aattended_heads

C-1

Dheads

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to implement scaled dot-product attention: compute scores, apply softmax, and multiply by values.

NLP

scores = torch.matmul(queries, [1].transpose(-2, -1)) / math.sqrt([2])
weights = torch.nn.functional.softmax(scores, dim=[3])
output = torch.matmul(weights, values)

Drag options to blanks, or click blank then click option'

Akeys

Bkey_dim

C-1

Dqueries

Attempts:

3 left

Practice

(1/5)

1. What is the main purpose of self-attention in natural language processing?

easy

A. To reduce the size of the input data by removing words

B. To generate random sentences without context

C. To translate text from one language to another

D. To let the model focus on important words by comparing all words to each other

Self-attention and multi-head attention in NLP - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand self-attention's role

Step 2: Match purpose with options

Final Answer:

Quick Check:

Solution

Step 1: Recall multi-head attention definition

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Extract the second row scores

Step 2: Apply softmax to these scores

Final Answer:

Quick Check:

Solution

Step 1: Analyze softmax calculation

Step 2: Check output aggregation

Final Answer:

Quick Check:

Solution

Step 1: Understand effect of increasing attention heads

Step 2: Consider computational cost and accuracy

Final Answer:

Quick Check: