How to Use spaCy for NLP: Quick Guide and Examples
To use
spaCy for NLP, first install it and load a language model like en_core_web_sm. Then, create a nlp object to process text and extract information such as tokens, parts of speech, and named entities.Syntax
Using spaCy involves loading a language model, creating an NLP pipeline object, and processing text with it. The main steps are:
import spacy: Import the spaCy library.nlp = spacy.load('en_core_web_sm'): Load a small English model.doc = nlp(text): Process your text to get aDocobject.- Use
docto access tokens, parts of speech, entities, and more.
python
import spacy # Load English tokenizer, tagger, parser, NER and word vectors nlp = spacy.load('en_core_web_sm') # Process text text = "Apple is looking at buying U.K. startup for $1 billion" doc = nlp(text) # Access tokens for token in doc: print(token.text, token.pos_, token.dep_) # Access named entities for ent in doc.ents: print(ent.text, ent.label_)
Example
This example shows how to load spaCy, process a sentence, and print tokens with their parts of speech and named entities.
python
import spacy # Load the English model nlp = spacy.load('en_core_web_sm') # Sample text text = "Google was founded in September 1998 by Larry Page and Sergey Brin." # Process the text doc = nlp(text) # Print tokens with POS tags print("Tokens and POS tags:") for token in doc: print(f"{token.text}: {token.pos_}") # Print named entities print("\nNamed Entities:") for ent in doc.ents: print(f"{ent.text} ({ent.label_})")
Output
Tokens and POS tags:
Google: PROPN
was: AUX
founded: VERB
in: ADP
September: PROPN
1998: NUM
by: ADP
Larry: PROPN
Page: PROPN
and: CCONJ
Sergey: PROPN
Brin: PROPN
.: PUNCT
Named Entities:
Google (ORG)
September 1998 (DATE)
Larry Page (PERSON)
Sergey Brin (PERSON)
Common Pitfalls
Common mistakes when using spaCy include:
- Not installing the language model before loading it (run
python -m spacy download en_core_web_sm). - Trying to process text without loading a model first.
- Confusing
DocandTokenobjects. - Ignoring that spaCy models are case-sensitive and language-specific.
Always ensure the model is installed and loaded correctly before processing text.
python
import spacy # Wrong: Trying to process text without loading model # doc = spacy(text) # This will raise an error # Right way: nlp = spacy.load('en_core_web_sm') doc = nlp("Hello world!") print([token.text for token in doc])
Output
["Hello", "world", "!"]
Quick Reference
| Command | Description |
|---|---|
| import spacy | Import the spaCy library |
| spacy.load('en_core_web_sm') | Load the English small model |
| nlp(text) | Process text to create a Doc object |
| for token in doc: token.text | Access tokens in the text |
| for ent in doc.ents: ent.text, ent.label_ | Access named entities and their labels |
Key Takeaways
Always install and load a spaCy language model before processing text.
Use the nlp object to convert text into a Doc for easy analysis.
Tokens and named entities can be accessed directly from the Doc object.
Common errors include missing model installation and incorrect object usage.
spaCy provides fast, easy-to-use tools for many NLP tasks out of the box.
