0
0
MlopsHow-ToBeginner · 3 min read

Types of Machine Learning in Python with scikit-learn

In Python, the main types of machine learning are supervised learning, where models learn from labeled data; unsupervised learning, where models find patterns in unlabeled data; and reinforcement learning, where models learn by trial and error through rewards. The scikit-learn library supports supervised and unsupervised learning with easy-to-use APIs.
📐

Syntax

Here are the basic syntax patterns for the main types of machine learning in Python using scikit-learn:

  • Supervised learning: Train a model with labeled data using fit(X_train, y_train) and predict with predict(X_test).
  • Unsupervised learning: Fit a model on unlabeled data with fit(X) and transform or predict clusters.
  • Reinforcement learning: Not directly supported in scikit-learn, usually done with other libraries.
python
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans

# Supervised learning syntax
model_supervised = LogisticRegression()
model_supervised.fit(X_train, y_train)
predictions = model_supervised.predict(X_test)

# Unsupervised learning syntax
model_unsupervised = KMeans(n_clusters=3, random_state=42)
model_unsupervised.fit(X)
clusters = model_unsupervised.predict(X)
💻

Example

This example shows supervised learning with logistic regression and unsupervised learning with k-means clustering on simple datasets.

python
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Supervised learning example
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model_supervised = LogisticRegression(max_iter=200)
model_supervised.fit(X_train, y_train)
predictions = model_supervised.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

# Unsupervised learning example
model_unsupervised = KMeans(n_clusters=3, random_state=42)
model_unsupervised.fit(X)
clusters = model_unsupervised.predict(X)

print(f"Supervised learning accuracy: {accuracy:.2f}")
print(f"Unsupervised learning cluster labels (first 10): {clusters[:10]}")
Output
Supervised learning accuracy: 0.98 Unsupervised learning cluster labels (first 10): [0 0 0 0 0 0 0 0 0 0]
⚠️

Common Pitfalls

Common mistakes when using machine learning types in Python include:

  • Using supervised learning without labeled data causes errors.
  • Forgetting to split data into training and testing sets leads to overfitting.
  • Choosing wrong number of clusters in unsupervised learning can give poor results.
  • Trying reinforcement learning with scikit-learn which does not support it.
python
from sklearn.linear_model import LogisticRegression

# Wrong: Trying to fit supervised model without labels
try:
    model = LogisticRegression()
    model.fit([[1, 2], [3, 4], [5, 6]])  # Missing y labels
except TypeError as e:
    print(f"Error: {e}")

# Right: Provide labels
model = LogisticRegression()
model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0])
Output
Error: fit() missing 1 required positional argument: 'y'
📊

Quick Reference

TypeDescriptionExample scikit-learn Classes
Supervised LearningLearn from labeled data to predict outcomesLogisticRegression, RandomForestClassifier, SVC
Unsupervised LearningFind patterns or groups in unlabeled dataKMeans, DBSCAN, PCA
Reinforcement LearningLearn by rewards from actions (not in scikit-learn)Use libraries like Stable Baselines3

Key Takeaways

Supervised learning uses labeled data and is supported by scikit-learn with fit/predict methods.
Unsupervised learning finds patterns in unlabeled data and includes clustering and dimensionality reduction.
Reinforcement learning is not supported by scikit-learn and requires specialized libraries.
Always split data into training and testing sets to avoid overfitting in supervised learning.
Choosing the right algorithm depends on whether your data is labeled or unlabeled.