NlpHow-ToBeginner · 4 min read

How to Use BERT for Text Classification in NLP

To use BERT for text classification in NLP, first tokenize your text with a BERT tokenizer, then feed the tokens into a pre-trained BERT model with a classification head. Finally, train or fine-tune the model on your labeled dataset to predict text categories.

📐

Syntax

Here is the basic syntax to use BERT for text classification:

Tokenizer: Converts raw text into tokens BERT understands.
Model: Pre-trained BERT with a classification layer on top.
Input: Tokenized text with attention masks.
Output: Class probabilities or labels.

python

from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load pre-trained tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Example text
text = "This is a great movie!"

# Tokenize input
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Forward pass
outputs = model(**inputs)

# Get logits (raw predictions)
logits = outputs.logits

💻

Example

This example shows how to fine-tune BERT on a small dataset for binary text classification using PyTorch and Hugging Face Transformers.

python

from transformers import BertTokenizer, BertForSequenceClassification, AdamW
from torch.utils.data import DataLoader, Dataset
import torch

# Sample dataset
class SimpleDataset(Dataset):
    def __init__(self, texts, labels, tokenizer):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        encoding = self.tokenizer(self.texts[idx],
                                  truncation=True,
                                  padding='max_length',
                                  max_length=64,
                                  return_tensors='pt')
        item = {key: val.squeeze(0) for key, val in encoding.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

# Data
texts = ["I love this movie", "This film is terrible"]
labels = [1, 0]  # 1=positive, 0=negative

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Dataset and DataLoader
dataset = SimpleDataset(texts, labels, tokenizer)
dataloader = DataLoader(dataset, batch_size=2)

# Optimizer
optimizer = AdamW(model.parameters(), lr=5e-5)

# Training loop (1 epoch for demo)
model.train()
for batch in dataloader:
    optimizer.zero_grad()
    outputs = model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'], labels=batch['labels'])
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    print(f"Loss: {loss.item():.4f}")

# Prediction example
model.eval()
with torch.no_grad():
    test_text = "I really enjoyed this movie"
    inputs = tokenizer(test_text, return_tensors='pt', truncation=True, padding=True, max_length=64)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=1)
    print(f"Prediction probabilities: {probs}")

Output

Loss: 0.6931 Prediction probabilities: tensor([[0.4987, 0.5013]])

⚠️

Common Pitfalls

Not tokenizing input properly: BERT requires tokenized inputs with attention masks; raw text won't work.
Ignoring max length: Inputs longer than BERT's max length (usually 512 tokens) must be truncated.
Not fine-tuning: Using BERT without fine-tuning on your dataset often leads to poor classification results.
Wrong label format: Labels must be integers starting from 0 for classification heads.

python

## Wrong way: Passing raw text directly
# outputs = model(text)  # This will cause an error

## Right way: Tokenize first
# inputs = tokenizer(text, return_tensors='pt')
# outputs = model(**inputs)

📊

Quick Reference

Remember these key steps when using BERT for text classification:

Use BertTokenizer to convert text to tokens.
Load BertForSequenceClassification with the correct number of labels.
Prepare inputs with input_ids and attention_mask.
Fine-tune the model on your labeled dataset.
Use softmax on logits to get class probabilities.

✅

Key Takeaways

Always tokenize text inputs using BERT's tokenizer before feeding them to the model.

Fine-tune the pre-trained BERT model on your specific classification dataset for best results.

Use attention masks to tell BERT which tokens to focus on and which are padding.

Ensure labels are integer encoded starting at zero for classification tasks.

Truncate or pad inputs to BERT's maximum token length to avoid errors.