Bird
Raised Fist0
NlpHow-ToBeginner · 3 min read

How to Use spaCy with Transformers for NLP Tasks

You can use spaCy with transformer models by installing the spacy-transformers package and loading a transformer-based pipeline like en_core_web_trf. This lets you combine spaCy's easy NLP tools with powerful transformer embeddings for tasks like named entity recognition and text classification.
📐

Syntax

To use spaCy with transformers, first install spacy-transformers. Then load a transformer pipeline with spacy.load(). You can process text with nlp(text) to get transformer-powered results.

  • import spacy: Import spaCy library.
  • nlp = spacy.load('en_core_web_trf'): Load transformer pipeline.
  • doc = nlp(text): Process text to get tokens, entities, etc.
python
import spacy

# Load transformer-based pipeline
nlp = spacy.load('en_core_web_trf')

# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Access named entities
for ent in doc.ents:
    print(ent.text, ent.label_)
Output
Apple ORG U.K. GPE $1 billion MONEY
💻

Example

This example shows how to install the required packages, load a transformer pipeline in spaCy, and extract named entities from a sample sentence.

python
import spacy

# Make sure to install the model first by running:
# python -m spacy download en_core_web_trf

nlp = spacy.load('en_core_web_trf')

text = "Tesla plans to open a new factory in Berlin by 2023."
doc = nlp(text)

print("Named Entities:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_})")
Output
Named Entities: Tesla (ORG) Berlin (GPE) 2023 (DATE)
⚠️

Common Pitfalls

Common mistakes include not installing the spacy-transformers package or the transformer model, which causes loading errors. Another pitfall is using a pipeline without transformer support like en_core_web_sm, which won't use transformers. Also, transformers require more memory and CPU/GPU power, so expect slower processing compared to small models.

Wrong way (no transformer model):

nlp = spacy.load('en_core_web_sm')  # This is a small model without transformers

Right way (with transformer model):

nlp = spacy.load('en_core_web_trf')  # Transformer-based pipeline
python
import spacy

# Wrong: small model without transformers
nlp_wrong = spacy.load('en_core_web_sm')
doc_wrong = nlp_wrong("Apple is buying a startup.")
print("Entities with small model:")
for ent in doc_wrong.ents:
    print(ent.text, ent.label_)

# Right: transformer model
nlp_right = spacy.load('en_core_web_trf')
doc_right = nlp_right("Apple is buying a startup.")
print("\nEntities with transformer model:")
for ent in doc_right.ents:
    print(ent.text, ent.label_)
Output
Entities with small model: Apple ORG Entities with transformer model: Apple ORG
📊

Quick Reference

Summary tips for using spaCy with transformers:

  • Install spacy-transformers and transformer models like en_core_web_trf.
  • Use spacy.load('en_core_web_trf') to get transformer features.
  • Expect slower but more accurate results than small models.
  • Use doc.ents to access named entities detected by transformers.
  • Check spaCy docs for supported transformer models and tasks.

Key Takeaways

Load transformer pipelines in spaCy using spacy.load('en_core_web_trf') for transformer-powered NLP.
Install spacy-transformers and the transformer model before usage to avoid errors.
Transformer pipelines provide better accuracy but require more resources and run slower than small models.
Use doc.ents and other spaCy attributes normally; transformers enhance their quality.
Avoid using small models if you want transformer benefits; always pick transformer-based pipelines.