Complete the code to tokenize the input sentence into words.
sentence = "I love machine learning" tokens = sentence.[1]()
The split() method breaks the sentence into a list of words by spaces.
Complete the code to convert tokens to lowercase for uniformity.
tokens = ['I', 'Love', 'Machine', 'Learning'] lower_tokens = [word.[1]() for word in tokens]
The lower() method converts each word to lowercase, which helps the model treat words like 'Love' and 'love' the same.
Fix the error in the code to create a vocabulary dictionary mapping words to unique indices.
tokens = ['i', 'love', 'machine', 'learning', 'love'] vocab = {word: idx for idx, word in enumerate(set([1]))}
We use set(tokens) to get unique words, then enumerate to assign each a unique index.
Fill both blanks to convert a list of tokens into a list of indices using the vocabulary.
tokens = ['i', 'love', 'machine'] indices = [[1][word] for [2] in tokens]
We use the vocabulary dictionary vocab to get the index for each word in the tokens list.
Fill all three blanks to pad sequences to the same length using PyTorch.
import torch from torch.nn.utils.rnn import [1] sequences = [torch.tensor([1, 2, 3]), torch.tensor([4, 5])] padded = [2](sequences, batch_first=True, padding_value=[3])
pad_sequence pads a list of variable-length tensors to the same length. We use padding_value=0 to pad with zeros.