0
0
NLPml~10 mins

Information retrieval basics in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to tokenize the input text into words.

NLP
tokens = text.[1]()
Drag options to blanks, or click blank then click option'
Asplit
Bjoin
Creplace
Dstrip
Attempts:
3 left
💡 Hint
Common Mistakes
Using join() instead of split()
Using replace() which changes characters but does not split
Using strip() which only removes whitespace from ends
2fill in blank
medium

Complete the code to count the frequency of each word in the list.

NLP
from collections import [1]
word_counts = [1](words)
Drag options to blanks, or click blank then click option'
Adeque
BCounter
COrderedDict
Ddefaultdict
Attempts:
3 left
💡 Hint
Common Mistakes
Using defaultdict which needs manual counting
Using OrderedDict which keeps order but doesn't count
Using deque which is for queues, not counting
3fill in blank
hard

Fix the error in the code to compute the term frequency (TF) for a word.

NLP
tf = word_counts[[1]] / sum(word_counts.values())
Drag options to blanks, or click blank then click option'
Aword_counts
Bword
Cwords
D'word'
Attempts:
3 left
💡 Hint
Common Mistakes
Using variable word without quotes causes a NameError
Using words or word_counts which are not keys
4fill in blank
hard

Fill both blanks to create a dictionary of words with frequency greater than 1.

NLP
freq_words = {word: count for word, count in word_counts.items() if count [1] [2]
Drag options to blanks, or click blank then click option'
A>
B1
C>=
D0
Attempts:
3 left
💡 Hint
Common Mistakes
Using >= 0 includes all words
Using > 0 includes words with count 1
Using wrong operators like < or ==
5fill in blank
hard

Fill all three blanks to compute inverse document frequency (IDF) for a word.

NLP
import math
idf = math.log([1] / (1 + [2][[3]]))
Drag options to blanks, or click blank then click option'
Atotal_docs
Bdoc_freq
C'word'
Dword_counts
Attempts:
3 left
💡 Hint
Common Mistakes
Using word_counts instead of doc_freq for document frequency
Not quoting the word key
Forgetting to add 1 in denominator