Bird
Raised Fist0
NLPml~5 mins

NER with spaCy in NLP

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

NER helps find names of people, places, or things in text automatically. It makes reading and understanding text easier for computers.

Extracting names of people from news articles.
Finding locations mentioned in travel blogs.
Identifying dates and times in emails.
Pulling out company names from financial reports.
Highlighting product names in customer reviews.
Syntax
NLP
import spacy

# Load a pre-trained model
nlp = spacy.load('en_core_web_sm')

# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(ent.text, ent.label_)

Use spacy.load() to load a language model with NER included.

Entities are accessed with doc.ents, each having text and label_.

Examples
Extracts person and location names from a simple sentence.
NLP
doc = nlp("Barack Obama was born in Hawaii.")
for ent in doc.ents:
    print(ent.text, ent.label_)
Shows how to get all entities as a list of tuples.
NLP
doc = nlp("Amazon plans to open a new office in Seattle in 2024.")
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)
Sample Model

This program finds names of people, organizations, and places in the text.

NLP
import spacy

# Load English model with NER
nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University."

# Process text
doc = nlp(text)

# Print entities found
for ent in doc.ents:
    print(f"Entity: {ent.text}, Type: {ent.label_}")
OutputSuccess
Important Notes

spaCy's pre-trained models recognize common entity types like PERSON, ORG (organization), GPE (countries, cities), DATE, MONEY, etc.

NER works best on well-formed text; slang or typos may reduce accuracy.

You can train spaCy on your own data to recognize custom entities if needed.

Summary

NER finds important names and terms in text automatically.

spaCy makes NER easy with pre-trained models and simple code.

Extracted entities help computers understand text better for many applications.

Practice

(1/5)
1. What does NER (Named Entity Recognition) do in natural language processing?
easy
A. It generates new text based on input prompts.
B. It translates text from one language to another.
C. It summarizes long documents into short paragraphs.
D. It finds and labels important names and terms in text automatically.

Solution

  1. Step 1: Understand NER's purpose

    NER identifies specific names like people, places, or organizations in text.
  2. Step 2: Compare with other NLP tasks

    Translation, summarization, and text generation are different tasks than NER.
  3. Final Answer:

    It finds and labels important names and terms in text automatically. -> Option D
  4. Quick Check:

    NER = Finds names and terms [OK]
Hint: NER extracts names and terms, not translations or summaries [OK]
Common Mistakes:
  • Confusing NER with translation or summarization
  • Thinking NER generates new text
  • Believing NER only finds keywords, not named entities
2. Which of the following is the correct way to load a pre-trained spaCy model for NER?
easy
A. import spacy; nlp = spacy.load('en_core_web_sm')
B. import spacy; nlp = spacy.model('en_core_web_sm')
C. import spacy; nlp = spacy.load_model('en_core_web_sm')
D. import spacy; nlp = spacy.get('en_core_web_sm')

Solution

  1. Step 1: Recall spaCy model loading syntax

    spaCy uses spacy.load('model_name') to load pre-trained models.
  2. Step 2: Check each option

    Only import spacy; nlp = spacy.load('en_core_web_sm') uses spacy.load correctly; others use invalid functions.
  3. Final Answer:

    import spacy; nlp = spacy.load('en_core_web_sm') -> Option A
  4. Quick Check:

    spaCy model loading = spacy.load() [OK]
Hint: Use spacy.load('model_name') to load models [OK]
Common Mistakes:
  • Using spacy.model or spacy.load_model which don't exist
  • Trying spacy.get which is not a spaCy function
  • Forgetting to import spacy before loading
3. Given this code snippet using spaCy for NER:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('Apple is looking at buying U.K. startup for $1 billion')
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)

What will be the output?
medium
A. [('Apple', 'PERSON'), ('U.K.', 'ORG'), ('$1 billion', 'QUANTITY')]
B. [('Apple', 'ORG'), ('startup', 'ORG'), ('$1 billion', 'MONEY')]
C. [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
D. [('Apple', 'GPE'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

Solution

  1. Step 1: Understand spaCy NER labels

    Apple is recognized as an organization (ORG), U.K. as geopolitical entity (GPE), and $1 billion as money (MONEY).
  2. Step 2: Match entities with labels

    [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')] correctly matches these entities and labels as spaCy outputs.
  3. Final Answer:

    [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')] -> Option C
  4. Quick Check:

    spaCy NER output matches [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')] [OK]
Hint: Check spaCy's common entity labels for correct matches [OK]
Common Mistakes:
  • Confusing ORG with PERSON or GPE
  • Mislabeling MONEY as QUANTITY
  • Including words like 'startup' as entities
4. You run this code but get an error:
import spacy
doc = nlp('Google is a tech giant')

What is the most likely cause?
medium
A. spaCy does not support the word 'Google'.
B. The variable 'nlp' is not defined before use.
C. The text input is too short for NER.
D. Missing parentheses in the print statement.

Solution

  1. Step 1: Check variable definitions

    The code uses 'nlp' without defining it by loading a spaCy model first.
  2. Step 2: Identify error cause

    This causes a NameError because 'nlp' is undefined.
  3. Final Answer:

    The variable 'nlp' is not defined before use. -> Option B
  4. Quick Check:

    Undefined variable 'nlp' causes error [OK]
Hint: Always load model with spacy.load before using nlp [OK]
Common Mistakes:
  • Assuming text length causes error
  • Thinking spaCy can't recognize common words
  • Confusing print syntax errors with variable errors
5. You want to extract only person names from a text using spaCy's NER. Which code snippet correctly filters for persons?
hard
A. persons = [ent.text for ent in doc.ents if ent.label_ == 'PERSON']
B. persons = [ent.text for ent in doc.ents if ent.label_ == 'ORG']
C. persons = [ent.text for ent in doc.ents if ent.label_ == 'GPE']
D. persons = [ent.text for ent in doc.ents if ent.label_ == 'MONEY']

Solution

  1. Step 1: Identify label for persons in spaCy

    spaCy uses 'PERSON' label for people names.
  2. Step 2: Filter entities by 'PERSON'

    Filtering doc.ents by ent.label_ == 'PERSON' extracts only person names.
  3. Final Answer:

    persons = [ent.text for ent in doc.ents if ent.label_ == 'PERSON'] -> Option A
  4. Quick Check:

    Filter entities by 'PERSON' label [OK]
Hint: Filter entities with label_ == 'PERSON' to get names [OK]
Common Mistakes:
  • Using wrong labels like ORG or GPE for persons
  • Not filtering entities at all
  • Confusing entity text with label