0
0
NLPml~10 mins

Why preprocessing cleans raw text in NLP - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to convert raw text to lowercase for consistent processing.

NLP
clean_text = raw_text.[1]()
Drag options to blanks, or click blank then click option'
Acapitalize
Btitle
Clower
Dupper
Attempts:
3 left
💡 Hint
Common Mistakes
Using upper() changes text to uppercase, which is not standard for preprocessing.
2fill in blank
medium

Complete the code to remove punctuation from the text.

NLP
import string
clean_text = ''.join(char for char in raw_text if char not in [1])
Drag options to blanks, or click blank then click option'
Astring.punctuation
Bstring.whitespace
Cstring.ascii_letters
Dstring.digits
Attempts:
3 left
💡 Hint
Common Mistakes
Removing digits or whitespace instead does not clean punctuation.
3fill in blank
hard

Fix the error in the code to tokenize text into words correctly.

NLP
tokens = raw_text.[1](' ')
Drag options to blanks, or click blank then click option'
Astrip
Bjoin
Creplace
Dsplit
Attempts:
3 left
💡 Hint
Common Mistakes
Using join() combines tokens instead of splitting.
4fill in blank
hard

Fill both blanks to create a dictionary of word counts from tokenized text.

NLP
word_counts = {}
for word in tokens:
    word_counts[[1]] = word_counts.get([2], 0) + 1
Drag options to blanks, or click blank then click option'
Aword
Btoken
Cword.lower()
Dtokens
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'token' or 'tokens' causes errors because those variables don't exist or are collections.
5fill in blank
hard

Fill all three blanks to filter out stopwords from tokens and create a clean list.

NLP
stopwords = {'and', 'the', 'is', 'in'}
clean_tokens = [[1] for [2] in tokens if [3] not in stopwords]
Drag options to blanks, or click blank then click option'
Aword
Btoken
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing variable names causes NameError or incorrect filtering.