Bird
Raised Fist0
NlpHow-ToBeginner · 3 min read

How to Use spaCy for NLP: Quick Guide and Examples

To use spaCy for NLP, first install it and load a language model like en_core_web_sm. Then, create a nlp object to process text and extract information such as tokens, parts of speech, and named entities.
📐

Syntax

Using spaCy involves loading a language model, creating an NLP pipeline object, and processing text with it. The main steps are:

  • import spacy: Import the spaCy library.
  • nlp = spacy.load('en_core_web_sm'): Load a small English model.
  • doc = nlp(text): Process your text to get a Doc object.
  • Use doc to access tokens, parts of speech, entities, and more.
python
import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en_core_web_sm')

# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Access tokens
for token in doc:
    print(token.text, token.pos_, token.dep_)

# Access named entities
for ent in doc.ents:
    print(ent.text, ent.label_)
💻

Example

This example shows how to load spaCy, process a sentence, and print tokens with their parts of speech and named entities.

python
import spacy

# Load the English model
nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Google was founded in September 1998 by Larry Page and Sergey Brin."

# Process the text
doc = nlp(text)

# Print tokens with POS tags
print("Tokens and POS tags:")
for token in doc:
    print(f"{token.text}: {token.pos_}")

# Print named entities
print("\nNamed Entities:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_})")
Output
Tokens and POS tags: Google: PROPN was: AUX founded: VERB in: ADP September: PROPN 1998: NUM by: ADP Larry: PROPN Page: PROPN and: CCONJ Sergey: PROPN Brin: PROPN .: PUNCT Named Entities: Google (ORG) September 1998 (DATE) Larry Page (PERSON) Sergey Brin (PERSON)
⚠️

Common Pitfalls

Common mistakes when using spaCy include:

  • Not installing the language model before loading it (run python -m spacy download en_core_web_sm).
  • Trying to process text without loading a model first.
  • Confusing Doc and Token objects.
  • Ignoring that spaCy models are case-sensitive and language-specific.

Always ensure the model is installed and loaded correctly before processing text.

python
import spacy

# Wrong: Trying to process text without loading model
# doc = spacy(text)  # This will raise an error

# Right way:
nlp = spacy.load('en_core_web_sm')
doc = nlp("Hello world!")
print([token.text for token in doc])
Output
["Hello", "world", "!"]
📊

Quick Reference

CommandDescription
import spacyImport the spaCy library
spacy.load('en_core_web_sm')Load the English small model
nlp(text)Process text to create a Doc object
for token in doc: token.textAccess tokens in the text
for ent in doc.ents: ent.text, ent.label_Access named entities and their labels

Key Takeaways

Always install and load a spaCy language model before processing text.
Use the nlp object to convert text into a Doc for easy analysis.
Tokens and named entities can be accessed directly from the Doc object.
Common errors include missing model installation and incorrect object usage.
spaCy provides fast, easy-to-use tools for many NLP tasks out of the box.