Practice

(1/5)

1. What is the main purpose of Named Entity Recognition (NER) in Natural Language Processing?

easy

A. To count the number of words in a sentence

B. To translate text from one language to another

C. To find names of people, places, and organizations in text

D. To correct spelling mistakes in text

Solution

Step 1: Understand NER's role
NER is designed to identify and classify named entities like people, places, and organizations in text.
Step 2: Compare with other NLP tasks
Translation, word counting, and spell checking are different tasks unrelated to NER.
Final Answer:
To find names of people, places, and organizations in text -> Option C
Quick Check:
NER = Find names [OK]

Hint: NER extracts names and places from text quickly [OK]

Common Mistakes:

Confusing NER with translation
Thinking NER counts words
Mixing NER with spell checking

2. Which NLTK function is used to perform Named Entity Recognition after POS tagging?

easy

A. ne_chunk()

B. word_tokenize()

C. pos_tag()

D. sent_tokenize()

Solution

Step 1: Identify NLTK functions for NER
NLTK uses ne_chunk() to recognize named entities from POS-tagged tokens.
Step 2: Differentiate from other functions
word_tokenize() splits text into words, pos_tag() tags parts of speech, and sent_tokenize() splits text into sentences.
Final Answer:
ne_chunk() -> Option A
Quick Check:
NER uses ne_chunk() [OK]

Hint: Use ne_chunk() after pos_tag() for NER in NLTK [OK]

Common Mistakes:

Using word_tokenize() for NER
Confusing pos_tag() with NER
Trying sent_tokenize() for entity recognition

3. What will be the output type of ne_chunk(pos_tag(word_tokenize(text))) in NLTK?

medium

A. A plain string with entity labels

B. A list of strings

C. A dictionary mapping words to entity types

D. A tree structure with named entities as subtrees

Solution

Step 1: Understand ne_chunk output
The ne_chunk() function returns a tree structure where named entities are subtrees labeled with entity types.
Step 2: Compare output types
It is not a list, dictionary, or plain string but a hierarchical tree that can be traversed.
Final Answer:
A tree structure with named entities as subtrees -> Option D
Quick Check:
ne_chunk output = tree structure [OK]

Hint: ne_chunk returns a tree, not a list or dict [OK]

Common Mistakes:

Expecting a list of strings
Thinking output is a dictionary
Assuming output is a plain string

4. Given the code snippet:

import nltk
text = "Apple is looking at buying U.K. startup"
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)
entities = nltk.ne_chunk(pos_tags, binary=True)
print(entities)

What is the likely error in this code?

medium

A. Missing import for ne_chunk

B. Incorrect argument 'binary=True' in ne_chunk

C. pos_tag requires a list of sentences, not tokens

D. word_tokenize should be called after ne_chunk

Solution

Step 1: Check ne_chunk parameters
The ne_chunk() function's binary=True limits it to binary NER (labels entities simply as NE, typically focusing on PERSON), which is incorrect for standard NER requiring specific types like PERSON, ORGANIZATION, GPE.
Step 2: Verify other parts
Imports are correct with import nltk, pos_tag() accepts tokenized words, and preprocessing order is proper.
Final Answer:
Incorrect argument 'binary=True' in ne_chunk -> Option B
Quick Check:
binary=True limits to binary NER [OK]

Hint: Use binary=False for detailed entity types in ne_chunk [OK]

Common Mistakes:

Using binary=True for detailed NER
Calling word_tokenize after ne_chunk
Misunderstanding pos_tag input

5. You want to extract only PERSON entities from a text using NLTK's ne_chunk. Which approach correctly filters PERSON entities from the chunked tree?

hard

A. Traverse the tree and select subtrees with label 'PERSON'

B. Use pos_tag to find tokens tagged as 'PERSON'

C. Filter tokens containing capital letters only

D. Use word_tokenize and select words starting with 'P'

Solution

Step 1: Understand ne_chunk output structure
Named entities are subtrees labeled with entity types like 'PERSON', so we must traverse the tree to find these subtrees.
Step 2: Evaluate filtering methods
pos_tag does not label entities, only parts of speech. Capital letters or starting with 'P' are unreliable heuristics.
Final Answer:
Traverse the tree and select subtrees with label 'PERSON' -> Option A
Quick Check:
Filter PERSON by subtree label [OK]

Hint: Filter PERSON entities by subtree label in ne_chunk tree [OK]

Common Mistakes:

Using pos_tag to find entities
Filtering by capitalization only
Selecting words by first letter

NER with NLTK in NLP - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand NER's role

Step 2: Compare with other NLP tasks

Final Answer:

Quick Check:

Solution

Step 1: Identify NLTK functions for NER

Step 2: Differentiate from other functions

Final Answer:

Quick Check:

Solution

Step 1: Understand ne_chunk output

Step 2: Compare output types

Final Answer:

Quick Check:

Solution

Step 1: Check ne_chunk parameters

Step 2: Verify other parts

Final Answer:

Quick Check:

Solution

Step 1: Understand ne_chunk output structure

Step 2: Evaluate filtering methods

Final Answer:

Quick Check: