0
0
ML Pythonml~10 mins

Text preprocessing (tokenization, stemming, lemmatization) in ML Python - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to tokenize the sentence into words using NLTK.

ML Python
from nltk.tokenize import word_tokenize
sentence = "I love learning AI!"
tokens = [1](sentence)
print(tokens)
Drag options to blanks, or click blank then click option'
Asplit_words
Bsent_tokenize
Ctokenize_words
Dword_tokenize
Attempts:
3 left
💡 Hint
Common Mistakes
Using sent_tokenize which splits text into sentences instead of words.
2fill in blank
medium

Complete the code to stem the word using PorterStemmer.

ML Python
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
word = "running"
stemmed_word = stemmer.[1](word)
print(stemmed_word)
Drag options to blanks, or click blank then click option'
Alemmatize
Bsplit
Cstem
Dtokenize
Attempts:
3 left
💡 Hint
Common Mistakes
Using lemmatize which is a different method for word normalization.
3fill in blank
hard

Fix the error in the code to lemmatize the word correctly using WordNetLemmatizer.

ML Python
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
word = "better"
lemma = lemmatizer.[1](word, pos='a')
print(lemma)
Drag options to blanks, or click blank then click option'
Alemmatize
Bstem
Ctokenize
Dsplit
Attempts:
3 left
💡 Hint
Common Mistakes
Using stem method which does not consider part of speech.
4fill in blank
hard

Fill both blanks to create a dictionary of word stems for words longer than 4 characters.

ML Python
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ['running', 'jumps', 'easily', 'fairly']
stem_dict = {word: [1] for word in words if len(word) [2] 4}
print(stem_dict)
Drag options to blanks, or click blank then click option'
Astemmer.stem(word)
Bword
C>
D<=
Attempts:
3 left
💡 Hint
Common Mistakes
Using the word itself instead of its stem.
Using '<=' instead of '>' in the condition.
5fill in blank
hard

Fill both blanks to create a dictionary of lemmas for words longer than 5 characters.

ML Python
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
words = ['running', 'jumps', 'easily', 'fairly']
lemma_dict = {word: lemmatizer.[1](word, pos='r') for word in words if len(word) [2] 5}
print(lemma_dict)
Drag options to blanks, or click blank then click option'
Blemmatize
C>
Dstem
Attempts:
3 left
💡 Hint
Common Mistakes
Adding extra characters in the key part of the dictionary.
Using stem instead of lemmatize.
Using '<' instead of '>' in the condition.