Complete the code to tokenize the sentence before named entity recognition.
import nltk sentence = "Apple is looking at buying U.K. startup for $1 billion" tokens = nltk.word_tokenize([1]) print(tokens)
The word_tokenize function needs the sentence string as input to split it into words.
Complete the code to tag parts of speech for the tokens.
pos_tags = nltk.pos_tag([1]) print(pos_tags)
The pos_tag function requires a list of tokens to assign part-of-speech tags.
Fix the error in the code to perform named entity recognition on POS-tagged tokens.
named_entities = nltk.ne_chunk([1]) print(named_entities)
The ne_chunk function requires POS-tagged tokens as input to identify named entities.
Fill both blanks to extract named entity labels and their word tokens from the tree.
for subtree in named_entities: if hasattr(subtree, '[1]') and subtree.label() == '[2]': print('Entity:', ' '.join([token for token, pos in subtree.leaves()]))
label() inside hasattr which expects a string attribute name.We check if the subtree has the label attribute and compare it to the entity type string, e.g., PERSON. Here, the blank expects the attribute name label.
Fill all three blanks to create a dictionary of named entities and their types.
entities = { ' '.join([token for token, pos in subtree.leaves()]): subtree.[1]() for subtree in named_entities if hasattr(subtree, '[2]') }
print(entities)
# Filter only entities of type [3]leaves instead of label for entity type.The dictionary comprehension extracts entity names as keys and their labels as values. We check for the label attribute and call label() to get the entity type. Finally, we filter for entities of type PERSON.