Types of Machine Learning in Python with scikit-learn
In Python, the main types of machine learning are
supervised learning, where models learn from labeled data; unsupervised learning, where models find patterns in unlabeled data; and reinforcement learning, where models learn by trial and error through rewards. The scikit-learn library supports supervised and unsupervised learning with easy-to-use APIs.Syntax
Here are the basic syntax patterns for the main types of machine learning in Python using scikit-learn:
- Supervised learning: Train a model with labeled data using
fit(X_train, y_train)and predict withpredict(X_test). - Unsupervised learning: Fit a model on unlabeled data with
fit(X)and transform or predict clusters. - Reinforcement learning: Not directly supported in
scikit-learn, usually done with other libraries.
python
from sklearn.linear_model import LogisticRegression from sklearn.cluster import KMeans # Supervised learning syntax model_supervised = LogisticRegression() model_supervised.fit(X_train, y_train) predictions = model_supervised.predict(X_test) # Unsupervised learning syntax model_unsupervised = KMeans(n_clusters=3, random_state=42) model_unsupervised.fit(X) clusters = model_unsupervised.predict(X)
Example
This example shows supervised learning with logistic regression and unsupervised learning with k-means clustering on simple datasets.
python
from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.cluster import KMeans from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load iris dataset iris = load_iris() X = iris.data y = iris.target # Supervised learning example X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) model_supervised = LogisticRegression(max_iter=200) model_supervised.fit(X_train, y_train) predictions = model_supervised.predict(X_test) accuracy = accuracy_score(y_test, predictions) # Unsupervised learning example model_unsupervised = KMeans(n_clusters=3, random_state=42) model_unsupervised.fit(X) clusters = model_unsupervised.predict(X) print(f"Supervised learning accuracy: {accuracy:.2f}") print(f"Unsupervised learning cluster labels (first 10): {clusters[:10]}")
Output
Supervised learning accuracy: 0.98
Unsupervised learning cluster labels (first 10): [0 0 0 0 0 0 0 0 0 0]
Common Pitfalls
Common mistakes when using machine learning types in Python include:
- Using supervised learning without labeled data causes errors.
- Forgetting to split data into training and testing sets leads to overfitting.
- Choosing wrong number of clusters in unsupervised learning can give poor results.
- Trying reinforcement learning with
scikit-learnwhich does not support it.
python
from sklearn.linear_model import LogisticRegression # Wrong: Trying to fit supervised model without labels try: model = LogisticRegression() model.fit([[1, 2], [3, 4], [5, 6]]) # Missing y labels except TypeError as e: print(f"Error: {e}") # Right: Provide labels model = LogisticRegression() model.fit([[1, 2], [3, 4], [5, 6]], [0, 1, 0])
Output
Error: fit() missing 1 required positional argument: 'y'
Quick Reference
| Type | Description | Example scikit-learn Classes |
|---|---|---|
| Supervised Learning | Learn from labeled data to predict outcomes | LogisticRegression, RandomForestClassifier, SVC |
| Unsupervised Learning | Find patterns or groups in unlabeled data | KMeans, DBSCAN, PCA |
| Reinforcement Learning | Learn by rewards from actions (not in scikit-learn) | Use libraries like Stable Baselines3 |
Key Takeaways
Supervised learning uses labeled data and is supported by scikit-learn with fit/predict methods.
Unsupervised learning finds patterns in unlabeled data and includes clustering and dimensionality reduction.
Reinforcement learning is not supported by scikit-learn and requires specialized libraries.
Always split data into training and testing sets to avoid overfitting in supervised learning.
Choosing the right algorithm depends on whether your data is labeled or unlabeled.