0
0
NLPml~10 mins

Document processing pipeline in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load a document as a string.

NLP
with open('document.txt', 'r') as file:
    text = file.[1]()
Drag options to blanks, or click blank then click option'
Areadlines
Breadline
Cread
Dopen
Attempts:
3 left
💡 Hint
Common Mistakes
Using readline() reads only one line, not the full document.
Using readlines() returns a list of lines, not a single string.
2fill in blank
medium

Complete the code to split the document text into sentences.

NLP
import nltk
nltk.download('punkt')
sentences = nltk.tokenize.[1](text)
Drag options to blanks, or click blank then click option'
Aword_tokenize
Bsplit
Ctokenize
Dsent_tokenize
Attempts:
3 left
💡 Hint
Common Mistakes
Using word_tokenize splits text into words, not sentences.
Using split is a basic string method and won't handle sentence boundaries properly.
3fill in blank
hard

Fix the error in the code to remove stopwords from the token list.

NLP
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
tokens = ['this', 'is', 'a', 'test']
filtered = [word for word in tokens if word [1] stop_words]
Drag options to blanks, or click blank then click option'
Ain
Bnot in
C==
D!=
Attempts:
3 left
💡 Hint
Common Mistakes
Using in keeps only stopwords, which is the opposite of what we want.
4fill in blank
hard

Fill both blanks to create a dictionary of word counts from tokens.

NLP
word_counts = [1]()
for word in tokens:
    word_counts[word] = word_counts.get(word, [2]) + 1
Drag options to blanks, or click blank then click option'
Adict
B0
C1
Dlist
Attempts:
3 left
💡 Hint
Common Mistakes
Using list() instead of dict() causes errors.
Using 1 as default count causes counts to start at 2.
5fill in blank
hard

Fill all three blanks to create a TF-IDF vectorizer and transform documents.

NLP
from sklearn.feature_extraction.text import [1]
vectorizer = [2](stop_words='english')
X = vectorizer.[3](documents)
Drag options to blanks, or click blank then click option'
ATfidfVectorizer
Cfit_transform
Dfit
Attempts:
3 left
💡 Hint
Common Mistakes
Using fit alone returns the model, not the transformed data.
Using wrong class names causes import errors.