Bird
Raised Fist0
NlpHow-ToBeginner ยท 3 min read

How to Use spaCy Pipeline in NLP: Syntax and Example

To use the spaCy pipeline in NLP, first load a language model with spacy.load(), then process text by calling the model on your text string. The pipeline automatically applies components like tokenization, tagging, parsing, and named entity recognition to produce a Doc object with rich linguistic annotations.
๐Ÿ“

Syntax

The basic syntax to use the spaCy pipeline is:

  • import spacy: Import the spaCy library.
  • nlp = spacy.load('en_core_web_sm'): Load a pre-trained language model.
  • doc = nlp(text): Process your text through the pipeline to get a Doc object.

The Doc object contains tokens and annotations from the pipeline components.

python
import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

# Process text through the pipeline
doc = nlp('Hello world!')
๐Ÿ’ป

Example

This example shows how to load the spaCy pipeline, process text, and access tokens and named entities detected by the pipeline.

python
import spacy

# Load the English model with pipeline components
nlp = spacy.load('en_core_web_sm')

# Text to process
text = 'Apple is looking at buying U.K. startup for $1 billion'

# Process the text
doc = nlp(text)

# Print tokens and their part-of-speech tags
for token in doc:
    print(f'{token.text} - {token.pos_}')

# Print named entities found
for ent in doc.ents:
    print(f'{ent.text} ({ent.label_})')
Output
Apple - PROPN is - AUX looking - VERB at - ADP buying - VERB U.K. - PROPN startup - NOUN for - ADP $ - SYM 1 - NUM billion - NUM Apple (ORG) U.K. (GPE) $1 billion (MONEY)
โš ๏ธ

Common Pitfalls

Common mistakes when using the spaCy pipeline include:

  • Not loading a language model before processing text, which causes errors.
  • Assuming the pipeline modifies the original text; it returns a new Doc object instead.
  • Forgetting to install the language model package (e.g., python -m spacy download en_core_web_sm).
  • Trying to access pipeline components before processing text.
python
import spacy

# Wrong: processing text without loading a model
# doc = spacy('Hello')  # This will raise an error

# Right way:
nlp = spacy.load('en_core_web_sm')
doc = nlp('Hello')
๐Ÿ“Š

Quick Reference

StepDescription
1. Import spaCyimport spacy
2. Load modelnlp = spacy.load('en_core_web_sm')
3. Process textdoc = nlp('Your text here')
4. Access tokensfor token in doc: print(token.text)
5. Access entitiesfor ent in doc.ents: print(ent.text, ent.label_)
โœ…

Key Takeaways

Always load a spaCy language model with spacy.load() before processing text.
Process text by calling the loaded model on your string to get a Doc object.
Use the Doc object to access tokens, parts of speech, and named entities.
Install required language models before use with python -m spacy download.
Avoid calling spaCy pipeline components before loading and processing text.