0
0
NLPml~5 mins

Multi-class text classification in NLP

Choose your learning style9 modes available
Introduction

Multi-class text classification helps us sort text into many groups. It makes understanding and organizing text easier.

Sorting emails into categories like work, personal, or spam.
Classifying news articles into topics like sports, politics, or technology.
Organizing customer reviews by sentiment: positive, neutral, or negative.
Tagging social media posts by subject such as travel, food, or fashion.
Syntax
NLP
model = SomeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

X_train is the text data for training.

y_train is the label for each text showing its category.

Examples
This example shows how to train a simple model to classify text into three categories.
NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['I love cats', 'The sky is blue', 'Python is great']
labels = ['pets', 'nature', 'programming']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression(multi_class='ovr')
model.fit(X, labels)

new_text = ['I like dogs']
X_new = vectorizer.transform(new_text)
pred = model.predict(X_new)
print(pred)
This example uses a pipeline to combine text vectorization and classification in one step.
NLP
from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ['apple is tasty', 'football is fun', 'coding is creative']
labels = ['food', 'sports', 'tech']

model = make_pipeline(CountVectorizer(), MultinomialNB())
model.fit(texts, labels)

print(model.predict(['I enjoy basketball']))
Sample Model

This program trains a model to classify text into three categories from a real dataset. It shows accuracy and one example prediction.

NLP
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load a small subset of data for speed
categories = ['alt.atheism', 'comp.graphics', 'sci.space']
data_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
data_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))

# Convert text to numbers
vectorizer = TfidfVectorizer(stop_words='english', max_features=1000)
X_train = vectorizer.fit_transform(data_train.data)
X_test = vectorizer.transform(data_test.data)

# Train model
model = LogisticRegression(max_iter=1000, multi_class='ovr')
model.fit(X_train, data_train.target)

# Predict on test data
predictions = model.predict(X_test)

# Calculate accuracy
acc = accuracy_score(data_test.target, predictions)

print(f"Accuracy: {acc:.3f}")
print(f"Sample prediction for first test text: {data_test.target_names[predictions[0]]}")
OutputSuccess
Important Notes

Text must be converted to numbers before training a model.

Multi-class means more than two categories to choose from.

Accuracy shows how often the model guesses right.

Summary

Multi-class text classification sorts text into many groups.

We turn text into numbers, then train a model to learn patterns.

Models predict categories and we check accuracy to see how well they work.