Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to split the text into words using Python's split method.
NLP
text = "Hello world! Let's learn tokenization." words = text.[1]()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using join() instead of split()
Using replace() which changes characters
Using strip() which removes spaces only at ends
✗ Incorrect
The split() method divides a string into a list of words based on spaces by default.
2fill in blank
mediumComplete the code to split the text into sentences using the nltk library.
NLP
import nltk nltk.download('punkt') text = "Hello world! Let's learn tokenization." sentences = nltk.tokenize.[1](text)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using word_tokenize which splits into words
Using split which is a string method, not nltk function
✗ Incorrect
sent_tokenize splits text into sentences, while word_tokenize splits into words.
3fill in blank
hardFix the error in the code to tokenize words using nltk correctly.
NLP
import nltk nltk.download('punkt') text = "Hello world! Let's learn tokenization." words = nltk.tokenize.word_tokenize([1])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the wrong variable like 'words' which is not defined yet
Passing the module name instead of the text
✗ Incorrect
The word_tokenize function needs the text string as input to split it into words.
4fill in blank
hardFill both blanks to create a dictionary with words as keys and their lengths as values, only for words longer than 3 characters.
NLP
text = "Tokenization splits text into words and sentences." words = text.split() lengths = { [1] : len([2]) for [1] in words if len([1]) > 3 }
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'words' instead of 'word' inside len()
Using 'text' which is the full string, not a single word
✗ Incorrect
We use 'word' as the variable for each item in words, and len(word) gives the length.
5fill in blank
hardFill all three blanks to create a list of sentences from text, then a list of words from the first sentence, and finally count the words.
NLP
import nltk nltk.download('punkt') text = "Hello world! Let's learn tokenization." sentences = nltk.tokenize.[1](text) first_sentence_words = nltk.tokenize.word_tokenize([2]) word_count = len([3])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using word_tokenize instead of sent_tokenize for sentences
Passing text instead of first sentence to word_tokenize
Counting length of sentences instead of words
✗ Incorrect
sent_tokenize splits text into sentences; sentences[0] is the first sentence; first_sentence_words holds the tokenized words to count.