Semi-supervised learning uses both labeled and unlabeled data. What is the main advantage of this approach compared to supervised learning?
Think about the cost and effort of labeling data.
Semi-supervised learning reduces the need for many labeled examples by leveraging unlabeled data, which is usually easier and cheaper to collect.
Given the following Python code using label propagation, what is the predicted label for the unlabeled point?
from sklearn.semi_supervised import LabelPropagation import numpy as np # Data points: 3 labeled, 1 unlabeled X = np.array([[1, 2], [2, 3], [3, 4], [8, 9]]) # Labels: 0, 0, 1, -1 (unlabeled) y = np.array([0, 0, 1, -1]) model = LabelPropagation() model.fit(X, y) predicted_label = model.transduction_[-1]
Label propagation assigns labels based on neighbors' labels.
The unlabeled point [8,9] is far from points labeled 0 and 1, but label propagation assigns it the label of the closest cluster, which is 0 in this case.
You have a small labeled dataset and a large unlabeled dataset. Which model is best suited for semi-supervised learning in this scenario?
Look for a model that can iteratively label unlabeled data.
Support Vector Machine with self-training can use labeled data to train, then label unlabeled data iteratively, making it suitable for semi-supervised learning.
In label spreading, which hyperparameter controls how much the model trusts the initial labels versus the structure of the unlabeled data?
This parameter balances label retention and propagation.
Alpha controls the clamping factor: a higher alpha means more trust in the initial labels, while a lower alpha means more trust in the unlabeled data structure.
You trained a semi-supervised model. Which metric is most appropriate to evaluate its performance on the labeled test set?
Consider that you have true labels for the test set.
Accuracy measures how many test samples are correctly classified, which is suitable when true labels are available.