NlpHow-ToBeginner · 3 min read

How to Use spaCy Pipeline in NLP: Syntax and Example

To use the spaCy pipeline in NLP, first load a language model with spacy.load(), then process text by calling the model on your text string. The pipeline automatically applies components like tokenization, tagging, parsing, and named entity recognition to produce a Doc object with rich linguistic annotations.

📐

Syntax

The basic syntax to use the spaCy pipeline is:

import spacy: Import the spaCy library.
nlp = spacy.load('en_core_web_sm'): Load a pre-trained language model.
doc = nlp(text): Process your text through the pipeline to get a Doc object.

The Doc object contains tokens and annotations from the pipeline components.

python

import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

# Process text through the pipeline
doc = nlp('Hello world!')

💻

Example

This example shows how to load the spaCy pipeline, process text, and access tokens and named entities detected by the pipeline.

python

import spacy

# Load the English model with pipeline components
nlp = spacy.load('en_core_web_sm')

# Text to process
text = 'Apple is looking at buying U.K. startup for $1 billion'

# Process the text
doc = nlp(text)

# Print tokens and their part-of-speech tags
for token in doc:
    print(f'{token.text} - {token.pos_}')

# Print named entities found
for ent in doc.ents:
    print(f'{ent.text} ({ent.label_})')

Output

Apple - PROPN is - AUX looking - VERB at - ADP buying - VERB U.K. - PROPN startup - NOUN for - ADP $ - SYM 1 - NUM billion - NUM Apple (ORG) U.K. (GPE) $1 billion (MONEY)

⚠️

Common Pitfalls

Common mistakes when using the spaCy pipeline include:

Not loading a language model before processing text, which causes errors.
Assuming the pipeline modifies the original text; it returns a new Doc object instead.
Forgetting to install the language model package (e.g., python -m spacy download en_core_web_sm).
Trying to access pipeline components before processing text.

python

import spacy

# Wrong: processing text without loading a model
# doc = spacy('Hello')  # This will raise an error

# Right way:
nlp = spacy.load('en_core_web_sm')
doc = nlp('Hello')

📊

Quick Reference

Step	Description
1. Import spaCy	import spacy
2. Load model	nlp = spacy.load('en_core_web_sm')
3. Process text	doc = nlp('Your text here')
4. Access tokens	for token in doc: print(token.text)
5. Access entities	for ent in doc.ents: print(ent.text, ent.label_)

✅

Key Takeaways

Always load a spaCy language model with spacy.load() before processing text.

Process text by calling the loaded model on your string to get a Doc object.

Use the Doc object to access tokens, parts of speech, and named entities.

Install required language models before use with python -m spacy download.

Avoid calling spaCy pipeline components before loading and processing text.