0
0
NLPml~20 mins

Sequence-to-sequence architecture in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Seq2Seq Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main purpose of a sequence-to-sequence model?

Sequence-to-sequence models are widely used in tasks like language translation. What is their main purpose?

ATo cluster data points into groups without labels
BTo classify images into fixed categories
CTo map an input sequence to an output sequence of possibly different length
DTo reduce the dimensionality of input data
Attempts:
2 left
💡 Hint

Think about tasks like translating a sentence from English to French.

Predict Output
intermediate
2:00remaining
Output shape of encoder and decoder in a seq2seq model

Consider a seq2seq model with an LSTM encoder and decoder. The encoder processes input sequences of length 10 with 16 features, and the decoder outputs sequences of length 12 with 20 features. What is the shape of the encoder's final hidden state and the decoder's output?

NLP
import torch
import torch.nn as nn

encoder = nn.LSTM(input_size=16, hidden_size=32, batch_first=True)
decoder = nn.LSTM(input_size=20, hidden_size=32, batch_first=True)

inputs = torch.randn(5, 10, 16)  # batch_size=5
encoder_outputs, (h_n, c_n) = encoder(inputs)

decoder_inputs = torch.randn(5, 12, 20)
decoder_outputs, _ = decoder(decoder_inputs, (h_n, c_n))

print(h_n.shape, decoder_outputs.shape)
Atorch.Size([5, 32]) torch.Size([5, 12, 20])
Btorch.Size([5, 1, 32]) torch.Size([12, 5, 32])
Ctorch.Size([1, 5, 16]) torch.Size([5, 10, 32])
Dtorch.Size([1, 5, 32]) torch.Size([5, 12, 32])
Attempts:
2 left
💡 Hint

Remember LSTM hidden states have shape (num_layers * num_directions, batch, hidden_size).

Hyperparameter
advanced
1:30remaining
Choosing the right hidden size for seq2seq LSTM

You are training a sequence-to-sequence model for machine translation. Which hidden size choice is most likely to improve model capacity without causing excessive overfitting?

AIncrease hidden size from 128 to 512 with dropout and early stopping
BDecrease hidden size from 128 to 32 to reduce overfitting
CKeep hidden size at 128 and remove dropout layers
DIncrease hidden size to 1024 without any regularization
Attempts:
2 left
💡 Hint

Think about balancing model complexity and regularization.

Metrics
advanced
1:30remaining
Evaluating seq2seq model with BLEU score

You trained a seq2seq model for text summarization. Which metric best measures how well the model output matches human summaries?

AConfusion matrix of predicted classes
BBLEU score measuring n-gram overlap between model and reference summaries
CMean Squared Error between input and output sequences
DAccuracy of predicting the next word in the input sequence
Attempts:
2 left
💡 Hint

Think about metrics used in natural language generation tasks.

🔧 Debug
expert
2:30remaining
Why does this seq2seq training loop cause exploding gradients?

Consider this simplified training loop for a seq2seq model. Why might the gradients explode?

NLP
for input_seq, target_seq in dataloader:
    optimizer.zero_grad()
    output_seq = model(input_seq, target_seq)
    loss = loss_fn(output_seq.view(-1, vocab_size), target_seq.view(-1))
    loss.backward()
    optimizer.step()
ANo gradient clipping is applied, so gradients can grow too large during backpropagation
BThe loss function is incorrect and returns zero, causing no gradient updates
CThe optimizer.zero_grad() is called after loss.backward(), causing accumulation
DThe model input and target sequences have mismatched batch sizes
Attempts:
2 left
💡 Hint

Think about common causes of exploding gradients in RNNs.