Bird
Raised Fist0
NLPml~20 mins

NER with NLTK in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
NER with NLTK Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this NER chunking code?
Given the following Python code using NLTK for Named Entity Recognition (NER), what is the output?
NLP
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
sentence = "Apple is looking at buying U.K. startup for $1 billion"
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
chunks = ne_chunk(pos_tags)
print(list(chunks))
A[('Apple', 'NNP'), ('is', 'VBZ'), ('looking', 'VBG'), ('at', 'IN'), Tree('ORGANIZATION', [('buying', 'VBG')]), ('U.K.', 'NNP'), ('startup', 'NN'), ('for', 'IN'), ('$', '$'), ('1', 'CD'), ('billion', 'CD')]
B[Tree('GPE', [('Apple', 'NNP')]), ('is', 'VBZ'), ('looking', 'VBG'), ('at', 'IN'), ('buying', 'VBG'), Tree('GPE', [('U.K.', 'NNP')]), ('startup', 'NN'), ('for', 'IN'), ('$', '$'), ('1', 'CD'), ('billion', 'CD')]
C[('Apple', 'NNP'), ('is', 'VBZ'), ('looking', 'VBG'), ('at', 'IN'), ('buying', 'VBG'), ('U.K.', 'NNP'), ('startup', 'NN'), ('for', 'IN'), ('$', '$'), ('1', 'CD'), ('billion', 'CD')]
D[Tree('ORGANIZATION', [('Apple', 'NNP')]), ('is', 'VBZ'), ('looking', 'VBG'), ('at', 'IN'), ('buying', 'VBG'), Tree('GPE', [('U.K.', 'NNP')]), ('startup', 'NN'), ('for', 'IN'), ('$', '$'), ('1', 'CD'), ('billion', 'CD')]
Attempts:
2 left
💡 Hint
Look at how NLTK labels named entities like organizations and geopolitical entities.
🧠 Conceptual
intermediate
1:30remaining
Which NLTK function performs POS tagging before NER?
In the NLTK pipeline for Named Entity Recognition, which function is responsible for assigning part-of-speech tags to tokens before chunking?
Ane_chunk()
Bword_tokenize()
Cpos_tag()
Dsent_tokenize()
Attempts:
2 left
💡 Hint
POS tags describe the role of each word like noun or verb.
Hyperparameter
advanced
1:30remaining
What is the effect of using binary=True in ne_chunk()?
In NLTK's ne_chunk function, setting binary=True changes the output. What does this parameter do?
AIt merges all named entities into a single 'NE' label without specific types.
BIt disables named entity recognition and returns only POS tags.
CIt enables recognition of only person names, ignoring other entity types.
DIt outputs entities as plain strings instead of Tree objects.
Attempts:
2 left
💡 Hint
Think about simplifying entity categories.
🔧 Debug
advanced
2:00remaining
Why does this NER code raise a LookupError?
Consider this code snippet: import nltk sentence = "Google is a tech giant" tokens = nltk.word_tokenize(sentence) pos_tags = nltk.pos_tag(tokens) chunks = nltk.ne_chunk(pos_tags) When running, it raises a LookupError about missing 'maxent_ne_chunker'. What is the cause?
AThe sentence contains unknown words causing the error.
BThe 'maxent_ne_chunker' model is not downloaded in NLTK data.
Cpos_tag() was not called before ne_chunk().
Dword_tokenize() requires a language parameter that is missing.
Attempts:
2 left
💡 Hint
Check if all required NLTK models are installed.
Model Choice
expert
2:30remaining
Which NLTK model is used internally by ne_chunk for NER?
NLTK's ne_chunk function uses a pre-trained model internally. Which model does it use for Named Entity Recognition?
AA Maximum Entropy classifier trained on the ACE corpus
BA Hidden Markov Model trained on the CoNLL 2003 dataset
CA Maximum Entropy classifier trained on the CONLL 2000 corpus
DA Maximum Entropy classifier trained on the ACE corpus and Treebank data
Attempts:
2 left
💡 Hint
Consider the training data NLTK mentions for ne_chunk.

Practice

(1/5)
1. What is the main purpose of Named Entity Recognition (NER) in Natural Language Processing?
easy
A. To count the number of words in a sentence
B. To translate text from one language to another
C. To find names of people, places, and organizations in text
D. To correct spelling mistakes in text

Solution

  1. Step 1: Understand NER's role

    NER is designed to identify and classify named entities like people, places, and organizations in text.
  2. Step 2: Compare with other NLP tasks

    Translation, word counting, and spell checking are different tasks unrelated to NER.
  3. Final Answer:

    To find names of people, places, and organizations in text -> Option C
  4. Quick Check:

    NER = Find names [OK]
Hint: NER extracts names and places from text quickly [OK]
Common Mistakes:
  • Confusing NER with translation
  • Thinking NER counts words
  • Mixing NER with spell checking
2. Which NLTK function is used to perform Named Entity Recognition after POS tagging?
easy
A. ne_chunk()
B. word_tokenize()
C. pos_tag()
D. sent_tokenize()

Solution

  1. Step 1: Identify NLTK functions for NER

    NLTK uses ne_chunk() to recognize named entities from POS-tagged tokens.
  2. Step 2: Differentiate from other functions

    word_tokenize() splits text into words, pos_tag() tags parts of speech, and sent_tokenize() splits text into sentences.
  3. Final Answer:

    ne_chunk() -> Option A
  4. Quick Check:

    NER uses ne_chunk() [OK]
Hint: Use ne_chunk() after pos_tag() for NER in NLTK [OK]
Common Mistakes:
  • Using word_tokenize() for NER
  • Confusing pos_tag() with NER
  • Trying sent_tokenize() for entity recognition
3. What will be the output type of ne_chunk(pos_tag(word_tokenize(text))) in NLTK?
medium
A. A plain string with entity labels
B. A list of strings
C. A dictionary mapping words to entity types
D. A tree structure with named entities as subtrees

Solution

  1. Step 1: Understand ne_chunk output

    The ne_chunk() function returns a tree structure where named entities are subtrees labeled with entity types.
  2. Step 2: Compare output types

    It is not a list, dictionary, or plain string but a hierarchical tree that can be traversed.
  3. Final Answer:

    A tree structure with named entities as subtrees -> Option D
  4. Quick Check:

    ne_chunk output = tree structure [OK]
Hint: ne_chunk returns a tree, not a list or dict [OK]
Common Mistakes:
  • Expecting a list of strings
  • Thinking output is a dictionary
  • Assuming output is a plain string
4. Given the code snippet:
import nltk
text = "Apple is looking at buying U.K. startup"
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)
entities = nltk.ne_chunk(pos_tags, binary=True)
print(entities)

What is the likely error in this code?
medium
A. Missing import for ne_chunk
B. Incorrect argument 'binary=True' in ne_chunk
C. pos_tag requires a list of sentences, not tokens
D. word_tokenize should be called after ne_chunk

Solution

  1. Step 1: Check ne_chunk parameters

    The ne_chunk() function's binary=True limits it to binary NER (labels entities simply as NE, typically focusing on PERSON), which is incorrect for standard NER requiring specific types like PERSON, ORGANIZATION, GPE.
  2. Step 2: Verify other parts

    Imports are correct with import nltk, pos_tag() accepts tokenized words, and preprocessing order is proper.
  3. Final Answer:

    Incorrect argument 'binary=True' in ne_chunk -> Option B
  4. Quick Check:

    binary=True limits to binary NER [OK]
Hint: Use binary=False for detailed entity types in ne_chunk [OK]
Common Mistakes:
  • Using binary=True for detailed NER
  • Calling word_tokenize after ne_chunk
  • Misunderstanding pos_tag input
5. You want to extract only PERSON entities from a text using NLTK's ne_chunk. Which approach correctly filters PERSON entities from the chunked tree?
hard
A. Traverse the tree and select subtrees with label 'PERSON'
B. Use pos_tag to find tokens tagged as 'PERSON'
C. Filter tokens containing capital letters only
D. Use word_tokenize and select words starting with 'P'

Solution

  1. Step 1: Understand ne_chunk output structure

    Named entities are subtrees labeled with entity types like 'PERSON', so we must traverse the tree to find these subtrees.
  2. Step 2: Evaluate filtering methods

    pos_tag does not label entities, only parts of speech. Capital letters or starting with 'P' are unreliable heuristics.
  3. Final Answer:

    Traverse the tree and select subtrees with label 'PERSON' -> Option A
  4. Quick Check:

    Filter PERSON by subtree label [OK]
Hint: Filter PERSON entities by subtree label in ne_chunk tree [OK]
Common Mistakes:
  • Using pos_tag to find entities
  • Filtering by capitalization only
  • Selecting words by first letter