PyTorchml~10 mins

Text preprocessing for RNNs in PyTorch - Interactive Code Practice

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to tokenize the input sentence into words.

PyTorch

sentence = "I love machine learning"
tokens = sentence.[1]()

Drag options to blanks, or click blank then click option'

Asplit

Bjoin

Creplace

Dstrip

Attempts:

3 left

2fill in blank

medium

Complete the code to convert tokens to lowercase for uniformity.

PyTorch

tokens = ['I', 'Love', 'Machine', 'Learning']
lower_tokens = [word.[1]() for word in tokens]

Drag options to blanks, or click blank then click option'

Aupper

Blower

Ccapitalize

Dtitle

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to create a vocabulary dictionary mapping words to unique indices.

PyTorch

tokens = ['i', 'love', 'machine', 'learning', 'love']
vocab = {word: idx for idx, word in enumerate(set([1]))}

Drag options to blanks, or click blank then click option'

Alist

Bvocab

Ctokens

Drange

Attempts:

3 left

4fill in blank

hard

Fill both blanks to convert a list of tokens into a list of indices using the vocabulary.

PyTorch

tokens = ['i', 'love', 'machine']
indices = [[1][word] for [2] in tokens]

Drag options to blanks, or click blank then click option'

Avocab

Bword

Ctokens

Dindices

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to pad sequences to the same length using PyTorch.

PyTorch

import torch
from torch.nn.utils.rnn import [1]

sequences = [torch.tensor([1, 2, 3]), torch.tensor([4, 5])]
padded = [2](sequences, batch_first=True, padding_value=[3])

Drag options to blanks, or click blank then click option'

Apad_sequence

Bpad_packed_sequence

Cpack_sequence

Attempts:

3 left

Practice

(1/5)

1. Why do we split text into tokens before feeding it to an RNN?

easy

A. Because RNNs process sequences of numbers, not raw text

B. To reduce the size of the dataset

C. To make the text look nicer

D. Because tokens are easier to print

Text preprocessing for RNNs in PyTorch - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand RNN input requirements

Step 2: Role of tokenization

Final Answer:

Quick Check:

Solution

Step 1: Identify PyTorch padding utilities

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand input sequences

Step 2: pad_sequence with batch_first=true

Final Answer:

Quick Check:

Solution

Step 1: Check pad_sequence default behavior

Step 2: Effect on output shape

Final Answer:

Quick Check:

Solution

Step 1: Tokenize text and convert tokens to integers

Step 2: Pad sequences and prepare batch tensor

Final Answer:

Quick Check: