beginner

What is positional encoding in the context of machine learning models like Transformers?

Positional encoding is a way to add information about the order or position of elements in a sequence, so the model knows where each element is located since models like Transformers do not process data in order by default.

Click to reveal answer

beginner

Why do Transformers need positional encoding?

Transformers process all input tokens simultaneously without any inherent order. Positional encoding helps the model understand the sequence order, which is important for tasks like language understanding.

Click to reveal answer

intermediate

How is sinusoidal positional encoding calculated?

It uses sine and cosine functions of different frequencies to create unique position vectors. For position pos and dimension i: PE(pos, 2i) = sin(pos / 10000^(2i/d_model)) and PE(pos, 2i+1) = cos(pos / 10000^(2i/d_model)). This helps the model learn relative positions.

Click to reveal answer

intermediate

What is the shape of the positional encoding tensor for a batch of sequences?

The positional encoding tensor usually has shape (sequence_length, embedding_dimension). When added to input embeddings, it matches the input shape (batch_size, sequence_length, embedding_dimension) by broadcasting over the batch.

Click to reveal answer

beginner

How does adding positional encoding affect the input embeddings in a Transformer?

Positional encoding is added element-wise to the input embeddings. This combined input carries both the token meaning and its position, allowing the Transformer to use position information during training and prediction.

Click to reveal answer

Why can't Transformers rely on sequence order without positional encoding?

ABecause they ignore input data

BBecause they only work with images

CBecause they use recurrent layers

DBecause they process all tokens simultaneously without order

Which functions are used in sinusoidal positional encoding?

ASine and cosine

BLinear and quadratic

CTangent and cotangent

DExponential and logarithm

What does the positional encoding vector depend on?

AThe batch size

BOnly the token value

CThe position in the sequence and embedding dimension

DThe model's output

How is positional encoding combined with input embeddings?

ABy adding them element-wise

BBy multiplying them

CBy concatenating them

DBy ignoring positional encoding

What is the main benefit of sinusoidal positional encoding?

AIt speeds up training

BIt allows the model to learn relative positions

CIt reduces model size

DIt removes the need for embeddings

Explain in your own words why positional encoding is important for Transformer models.

Describe how sinusoidal positional encoding is calculated and why sine and cosine functions are used.