0
0
ML Pythonml~5 mins

Multi-label classification in ML Python

Choose your learning style9 modes available
Introduction
Multi-label classification helps us find all the correct answers when one thing can belong to many groups at the same time.
Tagging photos where one photo can have cats, dogs, and cars all together.
Detecting emotions in a sentence where multiple feelings like happy and surprised can appear.
Classifying news articles that can belong to sports, politics, and health categories simultaneously.
Identifying diseases in medical images where a patient might have more than one condition.
Recommending products that fit multiple interests of a customer at once.
Syntax
ML Python
model = SomeMultiLabelModel()
model.fit(X_train, Y_train)
predictions = model.predict(X_test)
Y_train and predictions are arrays where each example can have multiple labels marked as 1 or 0.
Use special loss functions like binary cross-entropy to train multi-label models.
Examples
Using scikit-learn's MultiOutputClassifier to handle multi-label classification with logistic regression.
ML Python
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression

model = MultiOutputClassifier(LogisticRegression())
model.fit(X_train, Y_train)
predictions = model.predict(X_test)
A simple neural network with sigmoid activation for multi-label outputs and binary cross-entropy loss.
ML Python
import tensorflow as tf

model = tf.keras.Sequential([
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dense(num_labels, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=5)
predictions = model.predict(X_test)
Sample Model
This example shows how to train and test a multi-label classifier using decision trees. It prints accuracy for each label and the overall Hamming loss, which tells how many labels were predicted wrong on average.
ML Python
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, hamming_loss

# Create sample data: 100 samples, 5 features
X = np.random.rand(100, 5)

# Create multi-label targets: 3 labels per sample
# Each label is 0 or 1 randomly
Y = np.random.randint(2, size=(100, 3))

# Split data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Create multi-label model
model = MultiOutputClassifier(DecisionTreeClassifier(random_state=42))

# Train model
model.fit(X_train, Y_train)

# Predict
Y_pred = model.predict(X_test)

# Calculate accuracy per label
acc = [accuracy_score(Y_test[:, i], Y_pred[:, i]) for i in range(Y.shape[1])]

# Calculate Hamming loss (fraction of wrong labels)
hloss = hamming_loss(Y_test, Y_pred)

print(f"Accuracy per label: {acc}")
print(f"Hamming loss: {hloss:.3f}")
OutputSuccess
Important Notes
Multi-label classification is different from multi-class classification where each example has only one label.
Use sigmoid activation and binary cross-entropy loss in neural networks for multi-label tasks.
Metrics like Hamming loss and accuracy per label help understand multi-label model performance.
Summary
Multi-label classification finds all correct labels for each example, not just one.
It is useful when items belong to multiple groups at once, like tagging photos or emotions.
Special models and metrics are needed to handle multiple labels properly.