Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to tokenize the sentence into words.
NLP
from nltk.tokenize import word_tokenize sentence = "Hello world!" tokens = [1](sentence)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using sentence.split() which does not handle punctuation properly.
Using undefined functions like tokenize_words.
✗ Incorrect
The function word_tokenize from nltk.tokenize splits the sentence into words correctly.
2fill in blank
mediumComplete the code to convert all tokens to lowercase.
NLP
tokens = ['Hello', 'World'] lower_tokens = [[1] for token in tokens]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using token.upper() which makes letters uppercase.
Using token.capitalize() which only capitalizes the first letter.
✗ Incorrect
Using token.lower() converts each token to lowercase.
3fill in blank
hardFix the error in the code to remove stopwords from the token list.
NLP
from nltk.corpus import stopwords stop_words = set(stopwords.words('english')) tokens = ['this', 'is', 'a', 'test'] filtered_tokens = [token for token in tokens if [1]]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'token in stop_words' which keeps only stopwords.
Using equality checks which are incorrect for sets.
✗ Incorrect
To remove stopwords, keep tokens that are not in the stop_words set.
4fill in blank
hardFill both blanks to create a dictionary of word counts from tokens.
NLP
from collections import Counter tokens = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple'] word_counts = [1](token for token in tokens) print(word_counts[[2]])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using defaultdict which needs a default factory function.
Accessing counts with wrong keys.
✗ Incorrect
Counter counts occurrences of tokens. Access count of 'apple' by word_counts['apple'].
5fill in blank
hardFill all three blanks to lemmatize tokens using WordNetLemmatizer.
NLP
from nltk.stem import [1] lemmatizer = [2]() tokens = ['running', 'jumps', 'easily'] lemmas = [lemmatizer.[3](token) for token in tokens]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using PorterStemmer which is for stemming, not lemmatizing.
Using stem method instead of lemmatize.
✗ Incorrect
WordNetLemmatizer is imported and instantiated. The method to get lemmas is lemmatize.