What is Multi-class text classification in NLP?

NLPml~5 mins

Multi-class text classification in NLP

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Multi-class text classification helps us sort text into many groups. It makes understanding and organizing text easier.

Sorting emails into categories like work, personal, or spam.

Classifying news articles into topics like sports, politics, or technology.

Organizing customer reviews by sentiment: positive, neutral, or negative.

Tagging social media posts by subject such as travel, food, or fashion.

Syntax

NLP

model = SomeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

X_train is the text data for training.

y_train is the label for each text showing its category.

Examples

This example shows how to train a simple model to classify text into three categories.

NLP

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['I love cats', 'The sky is blue', 'Python is great']
labels = ['pets', 'nature', 'programming']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression(multi_class='ovr')
model.fit(X, labels)

new_text = ['I like dogs']
X_new = vectorizer.transform(new_text)
pred = model.predict(X_new)
print(pred)

This example uses a pipeline to combine text vectorization and classification in one step.

NLP

from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ['apple is tasty', 'football is fun', 'coding is creative']
labels = ['food', 'sports', 'tech']

model = make_pipeline(CountVectorizer(), MultinomialNB())
model.fit(texts, labels)

print(model.predict(['I enjoy basketball']))

Sample Model

This program trains a model to classify text into three categories from a real dataset. It shows accuracy and one example prediction.

NLP

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load a small subset of data for speed
categories = ['alt.atheism', 'comp.graphics', 'sci.space']
data_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
data_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))

# Convert text to numbers
vectorizer = TfidfVectorizer(stop_words='english', max_features=1000)
X_train = vectorizer.fit_transform(data_train.data)
X_test = vectorizer.transform(data_test.data)

# Train model
model = LogisticRegression(max_iter=1000, multi_class='ovr')
model.fit(X_train, data_train.target)

# Predict on test data
predictions = model.predict(X_test)

# Calculate accuracy
acc = accuracy_score(data_test.target, predictions)

print(f"Accuracy: {acc:.3f}")
print(f"Sample prediction for first test text: {data_test.target_names[predictions[0]]}")

OutputSuccess

Important Notes

Text must be converted to numbers before training a model.

Multi-class means more than two categories to choose from.

Accuracy shows how often the model guesses right.

Summary

Multi-class text classification sorts text into many groups.

We turn text into numbers, then train a model to learn patterns.

Models predict categories and we check accuracy to see how well they work.

Practice

(1/5)

1. What is the main goal of multi-class text classification?

easy

A. To sort text into multiple categories based on content

B. To translate text into another language

C. To count the number of words in a text

D. To generate new text from a given input

Multi-class text classification in NLP

Start learning this pattern below

Practice

Solution

Step 1: Understand the task of multi-class text classification

Step 2: Compare options with the task definition

Final Answer:

Quick Check:

Solution

Step 1: Identify how models process text

Step 2: Check which option converts text to numbers

Final Answer:

Quick Check:

Solution

Step 1: Understand training data and labels

Step 2: Predict class for new text "I love dogs"

Final Answer:

Quick Check:

Solution

Step 1: Check input to model.fit()

Step 2: Identify correct input

Final Answer:

Quick Check:

Solution

Step 1: Understand class imbalance impact

Step 2: Identify best practice to handle imbalance

Final Answer:

Quick Check: