How to Use Huggingface with PyTorch: Simple Guide
To use
Huggingface with PyTorch, install the transformers library, load a pretrained model and tokenizer with from_pretrained, then pass PyTorch tensors to the model for training or inference. Huggingface models are fully compatible with PyTorch and provide easy APIs for NLP tasks.Syntax
Here is the basic syntax to load a Huggingface model and tokenizer with PyTorch:
from transformers import AutoModelForSequenceClassification, AutoTokenizer: Import model and tokenizer classes.tokenizer = AutoTokenizer.from_pretrained('model-name'): Load tokenizer for text processing.model = AutoModelForSequenceClassification.from_pretrained('model-name'): Load pretrained PyTorch model.inputs = tokenizer(text, return_tensors='pt'): Convert text to PyTorch tensors.outputs = model(**inputs): Run model forward pass.
python
from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased') text = "Hello, Huggingface with PyTorch!" inputs = tokenizer(text, return_tensors='pt') outputs = model(**inputs)
Example
This example shows how to load a pretrained BERT model for sentiment classification, tokenize input text, and get model predictions using PyTorch tensors.
python
from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load tokenizer and model model_name = 'distilbert-base-uncased-finetuned-sst-2-english' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare input text text = "I love using Huggingface with PyTorch!" inputs = tokenizer(text, return_tensors='pt') # Forward pass outputs = model(**inputs) # Get logits and predicted class logits = outputs.logits detected_class = torch.argmax(logits).item() print(f"Logits: {logits}") print(f"Predicted class: {detected_class}")
Output
Logits: tensor([[ 3.2157, -3.2157]], grad_fn=<AddmmBackward0>)
Predicted class: 0
Common Pitfalls
- Not using
return_tensors='pt'in tokenizer causes errors because model expects PyTorch tensors. - Forgetting to move model and inputs to the same device (CPU or GPU) leads to device mismatch errors.
- Using the wrong model class for the task (e.g., loading a masked language model for classification) causes unexpected outputs.
- Not setting the model to evaluation mode (
model.eval()) during inference can affect results.
python
from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Wrong: tokenizer returns lists, not tensors model_name = 'bert-base-uncased' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) text = "Test" inputs = tokenizer(text) # Missing return_tensors='pt' # This will cause an error: # outputs = model(**inputs) # Correct way: inputs = tokenizer(text, return_tensors='pt') outputs = model(**inputs)
Quick Reference
| Step | Command | Purpose |
|---|---|---|
| 1 | pip install transformers torch | Install required libraries |
| 2 | from transformers import AutoModelForSequenceClassification, AutoTokenizer | Import model and tokenizer classes |
| 3 | tokenizer = AutoTokenizer.from_pretrained('model-name') | Load tokenizer for text processing |
| 4 | model = AutoModelForSequenceClassification.from_pretrained('model-name') | Load pretrained PyTorch model |
| 5 | inputs = tokenizer(text, return_tensors='pt') | Convert text to PyTorch tensors |
| 6 | outputs = model(**inputs) | Run model forward pass |
| 7 | model.eval() | Set model to evaluation mode for inference |
Key Takeaways
Install the transformers library to access Huggingface models compatible with PyTorch.
Always use tokenizer with return_tensors='pt' to get PyTorch tensors as input.
Load the correct model class for your task, e.g., sequence classification for sentiment analysis.
Move model and inputs to the same device (CPU or GPU) to avoid errors.
Set model.eval() during inference to get consistent results.