0
0
NLPml~10 mins

Tokenization (word and sentence) in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to split the text into words using Python's split method.

NLP
text = "Hello world! Let's learn tokenization."
words = text.[1]()
Drag options to blanks, or click blank then click option'
Astrip
Bjoin
Creplace
Dsplit
Attempts:
3 left
💡 Hint
Common Mistakes
Using join() instead of split()
Using replace() which changes characters
Using strip() which removes spaces only at ends
2fill in blank
medium

Complete the code to split the text into sentences using the nltk library.

NLP
import nltk
nltk.download('punkt')
text = "Hello world! Let's learn tokenization."
sentences = nltk.tokenize.[1](text)
Drag options to blanks, or click blank then click option'
Asent_tokenize
Bword_tokenize
Ctokenize_words
Dsplit
Attempts:
3 left
💡 Hint
Common Mistakes
Using word_tokenize which splits into words
Using split which is a string method, not nltk function
3fill in blank
hard

Fix the error in the code to tokenize words using nltk correctly.

NLP
import nltk
nltk.download('punkt')
text = "Hello world! Let's learn tokenization."
words = nltk.tokenize.word_tokenize([1])
Drag options to blanks, or click blank then click option'
Anltk
Bwords
Ctext
Dtokenize
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the wrong variable like 'words' which is not defined yet
Passing the module name instead of the text
4fill in blank
hard

Fill both blanks to create a dictionary with words as keys and their lengths as values, only for words longer than 3 characters.

NLP
text = "Tokenization splits text into words and sentences."
words = text.split()
lengths = { [1] : len([2]) for [1] in words if len([1]) > 3 }
Drag options to blanks, or click blank then click option'
Aword
Bwords
Ctext
Dlen
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'words' instead of 'word' inside len()
Using 'text' which is the full string, not a single word
5fill in blank
hard

Fill all three blanks to create a list of sentences from text, then a list of words from the first sentence, and finally count the words.

NLP
import nltk
nltk.download('punkt')
text = "Hello world! Let's learn tokenization."
sentences = nltk.tokenize.[1](text)
first_sentence_words = nltk.tokenize.word_tokenize([2])
word_count = len([3])
Drag options to blanks, or click blank then click option'
Asent_tokenize
Bsentences[0]
Cfirst_sentence_words
Dword_tokenize
Attempts:
3 left
💡 Hint
Common Mistakes
Using word_tokenize instead of sent_tokenize for sentences
Passing text instead of first sentence to word_tokenize
Counting length of sentences instead of words