Complete the code to tokenize the sentence into words using Python.
from nltk.tokenize import word_tokenize sentence = "Hello world!" tokens = [1](sentence)
The word_tokenize function from NLTK splits a sentence into words correctly, handling punctuation.
Complete the code to convert all tokens to lowercase.
tokens = ['Hello', 'World'] lower_tokens = [[1] for token in tokens]
Using token.lower() converts each word to lowercase, which is common in NLP preprocessing.
Fix the error in the code to remove stopwords from the token list.
from nltk.corpus import stopwords stop_words = set(stopwords.words('english')) tokens = ['this', 'is', 'a', 'test'] filtered = [word for word in tokens if [1] not in stop_words]
The variable word is the element being checked in the list comprehension to filter out stopwords.
Fill both blanks to create a dictionary of word counts from a list of tokens.
tokens = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple'] word_counts = { [1]: tokens.count([2]) for [1] in set(tokens) }
We use word as the variable name in the dictionary comprehension and count how many times each word appears in tokens.
Fill all three blanks to create a list of stemmed tokens using NLTK's PorterStemmer.
from nltk.stem import PorterStemmer ps = PorterStemmer() tokens = ['running', 'jumps', 'easily'] stemmed = [ps.[1](token) for [2] in tokens if len([3]) > 2]
The method to stem words is stem. The loop variable is token, which is also used in the length check.