How to Use FastText Python for NLP Tasks
To use
fasttext in Python for NLP, install the fasttext library, then train or load a model using fasttext.train_supervised() or fasttext.load_model(). Use the model's predict() method to classify text or get word vectors for text representation.Syntax
The main steps to use FastText in Python are:
import fasttext: Import the library.model = fasttext.train_supervised(input='data.txt'): Train a supervised model on labeled data.model = fasttext.load_model('model.bin'): Load a pre-trained model.labels, probabilities = model.predict(text): Predict labels for new text.vector = model.get_word_vector(word): Get vector representation of a word.
python
import fasttext # Train a supervised model model = fasttext.train_supervised(input='data.txt') # Load a saved model model = fasttext.load_model('model.bin') # Predict label for a sentence labels, probabilities = model.predict('example text') # Get word vector vector = model.get_word_vector('example')
Example
This example shows how to train a simple FastText supervised model on labeled data, save it, load it back, and predict labels for new sentences.
python
import fasttext # Create a small training file with labels with open('train.txt', 'w') as f: f.write('__label__greeting Hello world\n') f.write('__label__farewell Goodbye world\n') # Train the model model = fasttext.train_supervised(input='train.txt', epoch=5, lr=1.0) # Save the model model.save_model('model.bin') # Load the model loaded_model = fasttext.load_model('model.bin') # Predict labels for new text labels, probabilities = loaded_model.predict('Hello there') print('Labels:', labels) print('Probabilities:', probabilities)
Output
Labels: ['__label__greeting']
Probabilities: [0.99999994]
Common Pitfalls
Common mistakes when using FastText in Python include:
- Not formatting training data correctly: Each line must start with
__label__labelnamefollowed by the text. - Using unlabeled data for supervised training causes errors.
- Forgetting to save the model after training.
- Loading a model with
load_modelbefore training or saving it. - Confusing
fasttext(the official library) withfasttextwrappers or older versions.
python
import fasttext # Wrong: training data without labels with open('bad_train.txt', 'w') as f: f.write('Hello world\n') # This will raise an error # model = fasttext.train_supervised(input='bad_train.txt') # Right: training data with labels with open('good_train.txt', 'w') as f: f.write('__label__greet Hello world\n') model = fasttext.train_supervised(input='good_train.txt')
Quick Reference
FastText Python key functions:
| Function | Description |
|---|---|
| fasttext.train_supervised(input, epoch=5, lr=0.1) | Train a supervised text classification model from labeled data file |
| fasttext.load_model(path) | Load a saved FastText model from disk |
| model.predict(text, k=1) | Predict top k labels for given text |
| model.get_word_vector(word) | Get vector representation of a word |
| model.save_model(path) | Save the trained model to disk |
Key Takeaways
Format training data with labels starting with __label__ for supervised training.
Use fasttext.train_supervised() to train and fasttext.load_model() to load models.
Use model.predict() to classify new text and get labels with probabilities.
Save your trained model with model.save_model() to reuse later.
Get word vectors with model.get_word_vector() for text representation tasks.
