0

PyTorchml~20 mins

Text preprocessing for RNNs in PyTorch - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

or

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Text Preprocessing Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of tokenization and padding

Given the following PyTorch code for tokenizing and padding sequences for an RNN, what is the output of the padded tensor?

PyTorch

import torch
from torch.nn.utils.rnn import pad_sequence

sequences = [torch.tensor([1, 2, 3]), torch.tensor([4, 5]), torch.tensor([6])]
padded = pad_sequence(sequences, batch_first=True, padding_value=0)
print(padded)

Atensor([[1, 2, 3], [4, 5, 6], [0, 0, 0]])

Btensor([[1, 2, 3], [4, 5, 0], [6, 0, 0]])

Ctensor([[1, 2, 3], [0, 4, 5], [0, 0, 6]])

Dtensor([[1, 2, 3], [4, 5], [6]])

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Why use padding in RNN input sequences?

Why do we pad sequences to the same length before feeding them into an RNN?

ABecause RNNs require inputs of uniform length to process batches efficiently.

BBecause padding improves the accuracy of the RNN model.

CBecause padding converts text into numerical vectors.

DBecause padding removes stop words from the sequences.

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing max sequence length for padding

When preprocessing text for an RNN, what is a common approach to decide the maximum sequence length for padding?

ASet max length to the length of the longest sequence in the entire dataset.

BSet max length to the average length of all sequences without considering outliers.

CSet max length to a fixed number based on domain knowledge or percentile of sequence lengths.

DSet max length to the length of the shortest sequence to save memory.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Error in embedding input shape for RNN

What error will this PyTorch code raise when feeding input to an RNN embedding layer? import torch import torch.nn as nn embedding = nn.Embedding(10, 3) inputs = torch.tensor([1, 2, 3, 4]) embedded = embedding(inputs) rnn = nn.RNN(input_size=3, hidden_size=5, batch_first=True) output, hidden = rnn(embedded)

ARuntimeError: Expected 3D input for RNN but got 2D input

BTypeError: embedding() missing 1 required positional argument

CIndexError: index out of range in embedding

DNo error, code runs successfully

Attempts:

2 left

❓ Model Choice

expert

2:30remaining

Best preprocessing for variable-length text sequences in RNN training

You have a dataset of sentences with widely varying lengths. You want to train an RNN efficiently. Which preprocessing approach is best?

AConvert all sequences to one-hot vectors without padding.

BTruncate all sequences to the shortest sentence length to avoid padding.

CPad all sequences to the length of the longest sentence in the dataset.

DPad sequences to a fixed max length and use PyTorch's pack_padded_sequence to handle variable lengths.

Attempts:

2 left

Practice

(1/5)

1. Why do we split text into tokens before feeding it to an RNN?

easy

A. Because RNNs process sequences of numbers, not raw text

B. To reduce the size of the dataset

C. To make the text look nicer

D. Because tokens are easier to print

Text preprocessing for RNNs in PyTorch - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand RNN input requirements

Step 2: Role of tokenization

Final Answer:

Quick Check:

Solution

Step 1: Identify PyTorch padding utilities

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand input sequences

Step 2: pad_sequence with batch_first=true

Final Answer:

Quick Check:

Solution

Step 1: Check pad_sequence default behavior

Step 2: Effect on output shape

Final Answer:

Quick Check:

Solution

Step 1: Tokenize text and convert tokens to integers

Step 2: Pad sequences and prepare batch tensor

Final Answer:

Quick Check: