Bird
Raised Fist0
NLPml~5 mins

NER with spaCy in NLP - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What does NER stand for in spaCy?
NER stands for Named Entity Recognition. It means finding names of people, places, dates, and other important things in text.
Click to reveal answer
beginner
How does spaCy represent entities found in text?
spaCy uses the Doc.ents attribute to list all entities found in the text. Each entity has a label like PERSON, ORG, or DATE.
Click to reveal answer
beginner
What is the purpose of the nlp object in spaCy?
The nlp object is the main tool that processes text. It breaks text into words, finds entities, and adds other language features.
Click to reveal answer
intermediate
How can you add a new entity label to spaCy's NER model?
You can add a new entity label by updating the NER pipeline component and training the model with examples that include the new label.
Click to reveal answer
beginner
What is the role of training data in improving spaCy's NER model?
Training data with examples of text and correct entity labels helps spaCy learn to recognize entities better and improve accuracy.
Click to reveal answer
Which spaCy attribute gives you the entities found in a text?
ADoc.ents
BDoc.tokens
CDoc.text
DDoc.sents
What type of information does NER extract from text?
ASentiment scores
BTopic categories
CParts of speech tags
DNamed entities like people, places, and dates
How do you process text with spaCy to find entities?
APass text to the <code>nlp</code> object
BCall <code>Doc.ents</code> directly on raw text
CUse <code>nlp.entities()</code> function
DUse <code>nlp.tokenize()</code> only
Which of these is NOT a common entity label in spaCy?
AORG
BCOLOR
CPERSON
DDATE
What do you need to improve spaCy's NER model for a new entity type?
AAdding more stop words
BOnly changing the model code
CTraining data with examples of the new entity
DIncreasing the text length
Explain how spaCy's NER pipeline works to find entities in text.
Think about how text flows through spaCy and how entities are marked.
You got /4 concepts.
    Describe the steps to add a new custom entity label to spaCy's NER model.
    Focus on data preparation and model training.
    You got /4 concepts.

      Practice

      (1/5)
      1. What does NER (Named Entity Recognition) do in natural language processing?
      easy
      A. It generates new text based on input prompts.
      B. It translates text from one language to another.
      C. It summarizes long documents into short paragraphs.
      D. It finds and labels important names and terms in text automatically.

      Solution

      1. Step 1: Understand NER's purpose

        NER identifies specific names like people, places, or organizations in text.
      2. Step 2: Compare with other NLP tasks

        Translation, summarization, and text generation are different tasks than NER.
      3. Final Answer:

        It finds and labels important names and terms in text automatically. -> Option D
      4. Quick Check:

        NER = Finds names and terms [OK]
      Hint: NER extracts names and terms, not translations or summaries [OK]
      Common Mistakes:
      • Confusing NER with translation or summarization
      • Thinking NER generates new text
      • Believing NER only finds keywords, not named entities
      2. Which of the following is the correct way to load a pre-trained spaCy model for NER?
      easy
      A. import spacy; nlp = spacy.load('en_core_web_sm')
      B. import spacy; nlp = spacy.model('en_core_web_sm')
      C. import spacy; nlp = spacy.load_model('en_core_web_sm')
      D. import spacy; nlp = spacy.get('en_core_web_sm')

      Solution

      1. Step 1: Recall spaCy model loading syntax

        spaCy uses spacy.load('model_name') to load pre-trained models.
      2. Step 2: Check each option

        Only import spacy; nlp = spacy.load('en_core_web_sm') uses spacy.load correctly; others use invalid functions.
      3. Final Answer:

        import spacy; nlp = spacy.load('en_core_web_sm') -> Option A
      4. Quick Check:

        spaCy model loading = spacy.load() [OK]
      Hint: Use spacy.load('model_name') to load models [OK]
      Common Mistakes:
      • Using spacy.model or spacy.load_model which don't exist
      • Trying spacy.get which is not a spaCy function
      • Forgetting to import spacy before loading
      3. Given this code snippet using spaCy for NER:
      import spacy
      nlp = spacy.load('en_core_web_sm')
      doc = nlp('Apple is looking at buying U.K. startup for $1 billion')
      entities = [(ent.text, ent.label_) for ent in doc.ents]
      print(entities)

      What will be the output?
      medium
      A. [('Apple', 'PERSON'), ('U.K.', 'ORG'), ('$1 billion', 'QUANTITY')]
      B. [('Apple', 'ORG'), ('startup', 'ORG'), ('$1 billion', 'MONEY')]
      C. [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
      D. [('Apple', 'GPE'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

      Solution

      1. Step 1: Understand spaCy NER labels

        Apple is recognized as an organization (ORG), U.K. as geopolitical entity (GPE), and $1 billion as money (MONEY).
      2. Step 2: Match entities with labels

        [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')] correctly matches these entities and labels as spaCy outputs.
      3. Final Answer:

        [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')] -> Option C
      4. Quick Check:

        spaCy NER output matches [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')] [OK]
      Hint: Check spaCy's common entity labels for correct matches [OK]
      Common Mistakes:
      • Confusing ORG with PERSON or GPE
      • Mislabeling MONEY as QUANTITY
      • Including words like 'startup' as entities
      4. You run this code but get an error:
      import spacy
      doc = nlp('Google is a tech giant')

      What is the most likely cause?
      medium
      A. spaCy does not support the word 'Google'.
      B. The variable 'nlp' is not defined before use.
      C. The text input is too short for NER.
      D. Missing parentheses in the print statement.

      Solution

      1. Step 1: Check variable definitions

        The code uses 'nlp' without defining it by loading a spaCy model first.
      2. Step 2: Identify error cause

        This causes a NameError because 'nlp' is undefined.
      3. Final Answer:

        The variable 'nlp' is not defined before use. -> Option B
      4. Quick Check:

        Undefined variable 'nlp' causes error [OK]
      Hint: Always load model with spacy.load before using nlp [OK]
      Common Mistakes:
      • Assuming text length causes error
      • Thinking spaCy can't recognize common words
      • Confusing print syntax errors with variable errors
      5. You want to extract only person names from a text using spaCy's NER. Which code snippet correctly filters for persons?
      hard
      A. persons = [ent.text for ent in doc.ents if ent.label_ == 'PERSON']
      B. persons = [ent.text for ent in doc.ents if ent.label_ == 'ORG']
      C. persons = [ent.text for ent in doc.ents if ent.label_ == 'GPE']
      D. persons = [ent.text for ent in doc.ents if ent.label_ == 'MONEY']

      Solution

      1. Step 1: Identify label for persons in spaCy

        spaCy uses 'PERSON' label for people names.
      2. Step 2: Filter entities by 'PERSON'

        Filtering doc.ents by ent.label_ == 'PERSON' extracts only person names.
      3. Final Answer:

        persons = [ent.text for ent in doc.ents if ent.label_ == 'PERSON'] -> Option A
      4. Quick Check:

        Filter entities by 'PERSON' label [OK]
      Hint: Filter entities with label_ == 'PERSON' to get names [OK]
      Common Mistakes:
      • Using wrong labels like ORG or GPE for persons
      • Not filtering entities at all
      • Confusing entity text with label