What is Hugging Face integration basics in PyTorch?

PyTorchml~5 mins

Hugging Face integration basics in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Hugging Face makes it easy to use powerful language models without building them from scratch.

You want to quickly try a pre-trained language model for text classification.

You need to generate text like writing a story or answering questions.

You want to fine-tune a model on your own small dataset.

You want to use state-of-the-art models without deep knowledge of their internals.

Syntax

PyTorch

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load tokenizer and model
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input text
inputs = tokenizer('I love learning AI!', return_tensors='pt')

# Get model output
outputs = model(**inputs)

# Get prediction
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

Use AutoTokenizer and AutoModel classes to load pre-trained models easily.

Return tensors in PyTorch format with return_tensors='pt' for compatibility.

Examples

Load a base BERT model and tokenizer for classification tasks.

PyTorch

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')

Tokenize text with padding and truncation for batch processing.

PyTorch

inputs = tokenizer('Hello world!', padding=True, truncation=True, return_tensors='pt')

Run the model on inputs and get raw prediction scores (logits).

PyTorch

outputs = model(**inputs)
logits = outputs.logits

Sample Model

This program loads a pre-trained sentiment analysis model, tokenizes input text, runs the model, and prints the predicted sentiment with confidence scores.

PyTorch

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = 'distilbert-base-uncased-finetuned-sst-2-english'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = 'I love learning AI!'
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

labels = ['negative', 'positive']
predicted_label = labels[predictions.argmax()]

print(f'Text: {text}')
print(f'Predicted sentiment: {predicted_label}')
print(f'Confidence scores: {predictions.detach().numpy()}')

OutputSuccess

Important Notes

Hugging Face models come with pre-trained weights ready to use.

Always use the matching tokenizer for the model to ensure correct input formatting.

Softmax converts raw scores to probabilities that sum to 1.

Summary

Hugging Face lets you load and use powerful language models easily.

Use AutoTokenizer and AutoModel classes to handle tokenization and model loading.

Run the model on tokenized inputs and interpret outputs with softmax for predictions.