0
0
NLPml~10 mins

Tokenization in spaCy in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load the spaCy English model.

NLP
import spacy
nlp = spacy.[1]("en_core_web_sm")
Drag options to blanks, or click blank then click option'
Aload
Btokenize
Cprocess
Dparse
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'tokenize' or 'parse' instead of 'load' to load the model.
2fill in blank
medium

Complete the code to create a spaCy Doc object from text.

NLP
doc = nlp([1])
Drag options to blanks, or click blank then click option'
Anlp
Bsentence
Ctext
D"This is a sentence."
Attempts:
3 left
💡 Hint
Common Mistakes
Passing a variable name without defining it.
Passing the nlp object itself.
3fill in blank
hard

Fix the error in the code to print tokens from the Doc object.

NLP
for token in doc:
    print(token.[1])
Drag options to blanks, or click blank then click option'
Astring
Btokens
Ctext
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'tokens' or 'word' which are not valid token attributes.
4fill in blank
hard

Fill both blanks to create a list of token texts for tokens longer than 3 characters.

NLP
tokens = [token.[1] for token in doc if len(token.[2]) > 3]
Drag options to blanks, or click blank then click option'
Atext
Blemma_
Dpos_
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'lemma_' or 'pos_' which are not strings for length check.
5fill in blank
hard

Fill all three blanks to create a dictionary of token texts and their part-of-speech tags for tokens longer than 4 characters.

NLP
token_pos = {token.[1]: token.[2] for token in doc if len(token.[3]) > 4}
Drag options to blanks, or click blank then click option'
Atext
Bpos_
Dlemma_
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'lemma_' instead of 'pos_' for POS tags.
Checking length on 'pos_' which is not a string.