Model Pipeline - Tokenization in spaCy
This pipeline breaks down text into smaller pieces called tokens using spaCy. Tokens are like words or punctuation marks, which help computers understand and work with language.
Jump into concepts and practice - no test required
This pipeline breaks down text into smaller pieces called tokens using spaCy. Tokens are like words or punctuation marks, which help computers understand and work with language.
Tokenization does not involve training, so no convergence chart.
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | N/A | N/A | Tokenization is a rule-based process, so no training loss or accuracy applies. |
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('Hello, world!')
tokens = [token.text for token in doc]
print(tokens)import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('Test sentence.')
for token in doc:
print(token.text)