NlpHow-ToBeginner · 3 min read

How to Use spaCy for NLP: Quick Guide and Examples

To use spaCy for NLP, first install it and load a language model like en_core_web_sm. Then, create a nlp object to process text and extract information such as tokens, parts of speech, and named entities.

📐

Syntax

Using spaCy involves loading a language model, creating an NLP pipeline object, and processing text with it. The main steps are:

import spacy: Import the spaCy library.
nlp = spacy.load('en_core_web_sm'): Load a small English model.
doc = nlp(text): Process your text to get a Doc object.
Use doc to access tokens, parts of speech, entities, and more.

python

import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en_core_web_sm')

# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Access tokens
for token in doc:
    print(token.text, token.pos_, token.dep_)

# Access named entities
for ent in doc.ents:
    print(ent.text, ent.label_)

💻

Example

This example shows how to load spaCy, process a sentence, and print tokens with their parts of speech and named entities.

python

import spacy

# Load the English model
nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Google was founded in September 1998 by Larry Page and Sergey Brin."

# Process the text
doc = nlp(text)

# Print tokens with POS tags
print("Tokens and POS tags:")
for token in doc:
    print(f"{token.text}: {token.pos_}")

# Print named entities
print("\nNamed Entities:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_})")

Output

Tokens and POS tags: Google: PROPN was: AUX founded: VERB in: ADP September: PROPN 1998: NUM by: ADP Larry: PROPN Page: PROPN and: CCONJ Sergey: PROPN Brin: PROPN .: PUNCT Named Entities: Google (ORG) September 1998 (DATE) Larry Page (PERSON) Sergey Brin (PERSON)

⚠️

Common Pitfalls

Common mistakes when using spaCy include:

Not installing the language model before loading it (run python -m spacy download en_core_web_sm).
Trying to process text without loading a model first.
Confusing Doc and Token objects.
Ignoring that spaCy models are case-sensitive and language-specific.

Always ensure the model is installed and loaded correctly before processing text.

python

import spacy

# Wrong: Trying to process text without loading model
# doc = spacy(text)  # This will raise an error

# Right way:
nlp = spacy.load('en_core_web_sm')
doc = nlp("Hello world!")
print([token.text for token in doc])

Output

["Hello", "world", "!"]

📊

Quick Reference

Command	Description
import spacy	Import the spaCy library
spacy.load('en_core_web_sm')	Load the English small model
nlp(text)	Process text to create a Doc object
for token in doc: token.text	Access tokens in the text
for ent in doc.ents: ent.text, ent.label_	Access named entities and their labels

✅

Key Takeaways

Always install and load a spaCy language model before processing text.

Use the nlp object to convert text into a Doc for easy analysis.

Tokens and named entities can be accessed directly from the Doc object.

Common errors include missing model installation and incorrect object usage.

spaCy provides fast, easy-to-use tools for many NLP tasks out of the box.