0
0
NLPml~10 mins

First NLP pipeline - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to tokenize the sentence into words.

NLP
from nltk.tokenize import word_tokenize
sentence = "Hello world!"
tokens = [1](sentence)
Drag options to blanks, or click blank then click option'
Asentence.split
Bword_tokenize
Ctokenize_words
Dsplit_words
Attempts:
3 left
💡 Hint
Common Mistakes
Using sentence.split() which does not handle punctuation properly.
Using undefined functions like tokenize_words.
2fill in blank
medium

Complete the code to convert all tokens to lowercase.

NLP
tokens = ['Hello', 'World']
lower_tokens = [[1] for token in tokens]
Drag options to blanks, or click blank then click option'
Atoken.capitalize()
Btoken.upper()
Ctoken.lower()
Dtoken.title()
Attempts:
3 left
💡 Hint
Common Mistakes
Using token.upper() which makes letters uppercase.
Using token.capitalize() which only capitalizes the first letter.
3fill in blank
hard

Fix the error in the code to remove stopwords from the token list.

NLP
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
tokens = ['this', 'is', 'a', 'test']
filtered_tokens = [token for token in tokens if [1]]
Drag options to blanks, or click blank then click option'
Atoken not in stop_words
Btoken == stop_words
Ctoken != stop_words
Dtoken in stop_words
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'token in stop_words' which keeps only stopwords.
Using equality checks which are incorrect for sets.
4fill in blank
hard

Fill both blanks to create a dictionary of word counts from tokens.

NLP
from collections import Counter
tokens = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
word_counts = [1](token for token in tokens)
print(word_counts[[2]])
Drag options to blanks, or click blank then click option'
ACounter
Bdefaultdict
C'apple'
D'banana'
Attempts:
3 left
💡 Hint
Common Mistakes
Using defaultdict which needs a default factory function.
Accessing counts with wrong keys.
5fill in blank
hard

Fill all three blanks to lemmatize tokens using WordNetLemmatizer.

NLP
from nltk.stem import [1]
lemmatizer = [2]()
tokens = ['running', 'jumps', 'easily']
lemmas = [lemmatizer.[3](token) for token in tokens]
Drag options to blanks, or click blank then click option'
AWordNetLemmatizer
BPorterStemmer
Clemmatize
Dstem
Attempts:
3 left
💡 Hint
Common Mistakes
Using PorterStemmer which is for stemming, not lemmatizing.
Using stem method instead of lemmatize.