How to Use BERT for Text Classification in NLP
To use
BERT for text classification in NLP, first tokenize your text with a BERT tokenizer, then feed the tokens into a pre-trained BERT model with a classification head. Finally, train or fine-tune the model on your labeled dataset to predict text categories.Syntax
Here is the basic syntax to use BERT for text classification:
- Tokenizer: Converts raw text into tokens BERT understands.
- Model: Pre-trained BERT with a classification layer on top.
- Input: Tokenized text with attention masks.
- Output: Class probabilities or labels.
python
from transformers import BertTokenizer, BertForSequenceClassification import torch # Load pre-trained tokenizer and model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Example text text = "This is a great movie!" # Tokenize input inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512) # Forward pass outputs = model(**inputs) # Get logits (raw predictions) logits = outputs.logits
Example
This example shows how to fine-tune BERT on a small dataset for binary text classification using PyTorch and Hugging Face Transformers.
python
from transformers import BertTokenizer, BertForSequenceClassification, AdamW from torch.utils.data import DataLoader, Dataset import torch # Sample dataset class SimpleDataset(Dataset): def __init__(self, texts, labels, tokenizer): self.texts = texts self.labels = labels self.tokenizer = tokenizer def __len__(self): return len(self.texts) def __getitem__(self, idx): encoding = self.tokenizer(self.texts[idx], truncation=True, padding='max_length', max_length=64, return_tensors='pt') item = {key: val.squeeze(0) for key, val in encoding.items()} item['labels'] = torch.tensor(self.labels[idx]) return item # Data texts = ["I love this movie", "This film is terrible"] labels = [1, 0] # 1=positive, 0=negative # Load tokenizer and model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Dataset and DataLoader dataset = SimpleDataset(texts, labels, tokenizer) dataloader = DataLoader(dataset, batch_size=2) # Optimizer optimizer = AdamW(model.parameters(), lr=5e-5) # Training loop (1 epoch for demo) model.train() for batch in dataloader: optimizer.zero_grad() outputs = model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'], labels=batch['labels']) loss = outputs.loss loss.backward() optimizer.step() print(f"Loss: {loss.item():.4f}") # Prediction example model.eval() with torch.no_grad(): test_text = "I really enjoyed this movie" inputs = tokenizer(test_text, return_tensors='pt', truncation=True, padding=True, max_length=64) outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=1) print(f"Prediction probabilities: {probs}")
Output
Loss: 0.6931
Prediction probabilities: tensor([[0.4987, 0.5013]])
Common Pitfalls
- Not tokenizing input properly: BERT requires tokenized inputs with attention masks; raw text won't work.
- Ignoring max length: Inputs longer than BERT's max length (usually 512 tokens) must be truncated.
- Not fine-tuning: Using BERT without fine-tuning on your dataset often leads to poor classification results.
- Wrong label format: Labels must be integers starting from 0 for classification heads.
python
## Wrong way: Passing raw text directly # outputs = model(text) # This will cause an error ## Right way: Tokenize first # inputs = tokenizer(text, return_tensors='pt') # outputs = model(**inputs)
Quick Reference
Remember these key steps when using BERT for text classification:
- Use
BertTokenizerto convert text to tokens. - Load
BertForSequenceClassificationwith the correct number of labels. - Prepare inputs with
input_idsandattention_mask. - Fine-tune the model on your labeled dataset.
- Use softmax on logits to get class probabilities.
Key Takeaways
Always tokenize text inputs using BERT's tokenizer before feeding them to the model.
Fine-tune the pre-trained BERT model on your specific classification dataset for best results.
Use attention masks to tell BERT which tokens to focus on and which are padding.
Ensure labels are integer encoded starting at zero for classification tasks.
Truncate or pad inputs to BERT's maximum token length to avoid errors.
