Bird
Raised Fist0
NLPml~20 mins

Custom NER training basics in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Custom NER Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of training loop snippet for custom NER
What will be the printed output after running this training loop snippet for 3 iterations?
NLP
import spacy
from spacy.training.example import Example

nlp = spacy.blank('en')
ner = nlp.add_pipe('ner')
ner.add_label('ANIMAL')

optimizer = nlp.begin_training()

TRAIN_DATA = [
    ('I have a dog', {'entities': [(7, 10, 'ANIMAL')]}),
    ('She owns a cat', {'entities': [(10, 13, 'ANIMAL')]})
]

for i in range(3):
    losses = {}
    for text, annotations in TRAIN_DATA:
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        nlp.update([example], sgd=optimizer, losses=losses)
    print(f'Iteration {i+1}, Losses: {losses}')
A
Iteration 1, Losses: {'ner': 0.5}
Iteration 2, Losses: {'ner': 0.3}
Iteration 3, Losses: {'ner': 0.1}
B
Iteration 1, Losses: {'ner': 0.0}
Iteration 2, Losses: {'ner': 0.0}
Iteration 3, Losses: {'ner': 0.0}
C
Iteration 1, Losses: {'ner': 0.5}
Iteration 2, Losses: {'ner': 0.0}
Iteration 3, Losses: {'ner': 0.0}
D
Iteration 1, Losses: {'ner': 0.0}
Iteration 2, Losses: {'ner': 0.3}
Iteration 3, Losses: {'ner': 0.1}
Attempts:
2 left
💡 Hint
Losses usually decrease as training progresses but start from a positive value.
Model Choice
intermediate
1:30remaining
Choosing the right model architecture for custom NER
Which model architecture is best suited for training a custom Named Entity Recognition (NER) system from scratch?
AA convolutional neural network (CNN) designed for image classification
BA transformer-based model like BERT fine-tuned for token classification
CA recurrent neural network (RNN) with LSTM layers for sequence labeling
DA simple feedforward neural network without sequence context
Attempts:
2 left
💡 Hint
NER requires understanding context around each word in a sentence.
Hyperparameter
advanced
1:30remaining
Effect of batch size on custom NER training
What is the most likely effect of increasing the batch size during training of a custom NER model?
ATraining becomes slower but model generalizes better
BTraining becomes faster and model always achieves higher accuracy
CTraining speed and model performance remain unchanged
DTraining becomes faster but may lead to less stable updates
Attempts:
2 left
💡 Hint
Think about how many examples the model sees before updating weights.
Metrics
advanced
1:30remaining
Choosing the right metric for custom NER evaluation
Which metric best evaluates the performance of a custom NER model on a test set?
AMean squared error between predicted and true entity labels
BAccuracy of token classification ignoring entity boundaries
CPrecision, Recall, and F1-score based on exact entity matches
DConfusion matrix of sentence-level classification
Attempts:
2 left
💡 Hint
NER evaluation requires matching whole entities, not just tokens.
🔧 Debug
expert
2:00remaining
Identifying cause of poor entity recognition in custom NER
After training a custom NER model, it fails to recognize any entities in new sentences. Which is the most likely cause?
AThe training data had no entity annotations or was empty
BThe model was trained with too many epochs causing overfitting
CThe optimizer was set to None during training
DThe model was trained on a different language than the test sentences
Attempts:
2 left
💡 Hint
If the model never saw entities during training, it cannot learn to recognize them.

Practice

(1/5)
1. What is the main goal of custom NER training in NLP?
easy
A. To summarize long documents automatically
B. To teach the model to recognize specific words or phrases you label
C. To translate text from one language to another
D. To generate new text based on a prompt

Solution

  1. Step 1: Understand what NER means

    NER stands for Named Entity Recognition, which means finding specific words or phrases in text.
  2. Step 2: Identify the purpose of custom training

    Custom NER training teaches the model to find your special labeled words, not general tasks like translation or summarization.
  3. Final Answer:

    To teach the model to recognize specific words or phrases you label -> Option B
  4. Quick Check:

    Custom NER = Recognize labeled words [OK]
Hint: Custom NER means teaching model your special words [OK]
Common Mistakes:
  • Confusing NER with translation or summarization
  • Thinking NER generates new text
  • Assuming NER works without labeled data
2. Which of the following is the correct way to label a sentence for custom NER training in Python spaCy format?
easy
A. ('Apple is a company', {'entities': [(0, 5, 'ORG')]})
B. ('Apple is a company', {'labels': [(0, 5, 'ORG')]})
C. ('Apple is a company', {'entities': [(6, 7, 'ORG')]})
D. ('Apple is a company', {'entities': [(0, 5, 'PERSON')]})

Solution

  1. Step 1: Check the labeling key

    spaCy uses the 'entities' key, not 'labels', to hold labeled spans.
  2. Step 2: Verify the span and label

    Span (0,5) covers 'Apple' correctly, and label 'ORG' (organization) fits. A span like (6,7,'ORG') points to the wrong position, and 'PERSON' is incorrect for a company.
  3. Final Answer:

    ('Apple is a company', {'entities': [(0, 5, 'ORG')]}) -> Option A
  4. Quick Check:

    Correct key and span = ('Apple is a company', {'entities': [(0, 5, 'ORG')]}) [OK]
Hint: Use 'entities' key with correct span and label [OK]
Common Mistakes:
  • Using 'labels' instead of 'entities'
  • Incorrect character span for entity
  • Wrong entity type label
3. Given this training data snippet for custom NER:
TRAIN_DATA = [
  ('I love Paris', {'entities': [(7, 12, 'GPE')]})
]
What will the model predict for the sentence 'I love Paris' after training?
medium
A. [] (no entities)
B. [('I', 'GPE')]
C. [('Paris', 'GPE')]
D. [('love', 'GPE')]

Solution

  1. Step 1: Understand the labeled entity

    The training data labels 'Paris' from character 7 to 12 as 'GPE' (Geopolitical entity).
  2. Step 2: Predict model output after training

    The model learns to recognize 'Paris' as 'GPE' and should predict [('Paris', 'GPE')] for the same sentence.
  3. Final Answer:

    [('Paris', 'GPE')] -> Option C
  4. Quick Check:

    Entity span matches 'Paris' = [('Paris', 'GPE')] [OK]
Hint: Model predicts labeled spans from training data [OK]
Common Mistakes:
  • Confusing entity span with other words
  • Expecting no entities if training is done
  • Mixing entity labels
4. You wrote this code to add a new entity label to your NER model:
ner.add_label('ANIMAL')
But after training, the model never detects 'ANIMAL' entities. What is the most likely mistake?
medium
A. The label 'ANIMAL' is reserved and cannot be used
B. You used the wrong method name; it should be add_entity()
C. You need to call ner.remove_label('ANIMAL') before adding
D. You forgot to include training examples with 'ANIMAL' labels

Solution

  1. Step 1: Check the method usage

    ner.add_label('ANIMAL') is correct to add a new label. There is no add_entity() method, no need to call remove_label first, and 'ANIMAL' is not reserved.
  2. Step 2: Verify training data

    Model learns from examples. Without training examples labeled 'ANIMAL', model cannot detect it.
  3. Final Answer:

    You forgot to include training examples with 'ANIMAL' labels -> Option D
  4. Quick Check:

    Training data needed for new labels = You forgot to include training examples with 'ANIMAL' labels [OK]
Hint: Add labeled examples for new entity labels [OK]
Common Mistakes:
  • Assuming adding label alone trains model
  • Using wrong method names
  • Thinking labels are reserved keywords
5. You want to train a custom NER model to recognize two new entity types: 'FOOD' and 'DRINK'. You have labeled training data for both. Which of the following is the best approach to ensure the model learns both correctly?
hard
A. Add both labels with ner.add_label(), include balanced training examples for each, and train in multiple iterations
B. Add only 'FOOD' label first, train fully, then add 'DRINK' label and train again
C. Train the model without adding labels explicitly; it will learn automatically
D. Add labels but use only examples for 'FOOD' to avoid confusion

Solution

  1. Step 1: Add all new labels before training

    Adding both 'FOOD' and 'DRINK' labels upfront ensures model knows what to learn.
  2. Step 2: Provide balanced training data and train iteratively

    Balanced examples for both labels and multiple training loops help model learn both well.
  3. Final Answer:

    Add both labels with ner.add_label(), include balanced training examples for each, and train in multiple iterations -> Option A
  4. Quick Check:

    All labels + balanced data + training = Add both labels with ner.add_label(), include balanced training examples for each, and train in multiple iterations [OK]
Hint: Add all labels and balanced data before training [OK]
Common Mistakes:
  • Adding labels one by one with separate training
  • Skipping label addition
  • Training with unbalanced or missing examples