0
0
PyTorchml~20 mins

Text preprocessing for RNNs in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Text Preprocessing Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of tokenization and padding
Given the following PyTorch code for tokenizing and padding sequences for an RNN, what is the output of the padded tensor?
PyTorch
import torch
from torch.nn.utils.rnn import pad_sequence

sequences = [torch.tensor([1, 2, 3]), torch.tensor([4, 5]), torch.tensor([6])]
padded = pad_sequence(sequences, batch_first=True, padding_value=0)
print(padded)
Atensor([[1, 2, 3], [4, 5, 6], [0, 0, 0]])
Btensor([[1, 2, 3], [4, 5, 0], [6, 0, 0]])
Ctensor([[1, 2, 3], [0, 4, 5], [0, 0, 6]])
Dtensor([[1, 2, 3], [4, 5], [6]])
Attempts:
2 left
💡 Hint
Remember pad_sequence aligns sequences by padding shorter ones with zeros at the end when batch_first=True.
🧠 Conceptual
intermediate
1:30remaining
Why use padding in RNN input sequences?
Why do we pad sequences to the same length before feeding them into an RNN?
ABecause RNNs require inputs of uniform length to process batches efficiently.
BBecause padding improves the accuracy of the RNN model.
CBecause padding converts text into numerical vectors.
DBecause padding removes stop words from the sequences.
Attempts:
2 left
💡 Hint
Think about how batches are processed in parallel.
Hyperparameter
advanced
2:00remaining
Choosing max sequence length for padding
When preprocessing text for an RNN, what is a common approach to decide the maximum sequence length for padding?
ASet max length to the length of the longest sequence in the entire dataset.
BSet max length to the average length of all sequences without considering outliers.
CSet max length to a fixed number based on domain knowledge or percentile of sequence lengths.
DSet max length to the length of the shortest sequence to save memory.
Attempts:
2 left
💡 Hint
Consider balancing memory use and information retention.
🔧 Debug
advanced
2:00remaining
Error in embedding input shape for RNN
What error will this PyTorch code raise when feeding input to an RNN embedding layer? import torch import torch.nn as nn embedding = nn.Embedding(10, 3) inputs = torch.tensor([1, 2, 3, 4]) embedded = embedding(inputs) rnn = nn.RNN(input_size=3, hidden_size=5, batch_first=True) output, hidden = rnn(embedded)
ARuntimeError: Expected 3D input for RNN but got 2D input
BTypeError: embedding() missing 1 required positional argument
CIndexError: index out of range in embedding
DNo error, code runs successfully
Attempts:
2 left
💡 Hint
Check the shape of the tensor passed to the RNN.
Model Choice
expert
2:30remaining
Best preprocessing for variable-length text sequences in RNN training
You have a dataset of sentences with widely varying lengths. You want to train an RNN efficiently. Which preprocessing approach is best?
AConvert all sequences to one-hot vectors without padding.
BTruncate all sequences to the shortest sentence length to avoid padding.
CPad all sequences to the length of the longest sentence in the dataset.
DPad sequences to a fixed max length and use PyTorch's pack_padded_sequence to handle variable lengths.
Attempts:
2 left
💡 Hint
Consider both efficiency and preserving sequence information.