Python sklearn Program to Classify Iris Flowers
from sklearn.datasets import load_iris to load data, then train a model like LogisticRegression() on the iris data, and predict flower classes with model.predict().Examples
How to Think About It
Algorithm
Code
from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42) model = LogisticRegression(max_iter=200) model.fit(X_train, y_train) y_pred = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") print(f"Predictions: {y_pred[:5]}")
Dry Run
Let's trace the program predicting iris classes for the first test samples.
Load iris data
iris.data shape: (150, 4), iris.target shape: (150,)
Split data
Training samples: 112, Testing samples: 38
Train model
Model learns patterns from training features and labels
Predict test data
Model predicts classes for 38 test samples
Calculate accuracy
Accuracy computed by comparing predicted and true labels
| Test Sample Index | True Label | Predicted Label |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 0 | 0 |
| 2 | 2 | 2 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
Why This Works
Step 1: Load Data
We use load_iris() to get flower features and their species labels.
Step 2: Train Model
The logistic regression model learns to separate flower types based on features.
Step 3: Predict and Evaluate
The model predicts flower classes on new data and we check accuracy to see how well it learned.
Alternative Approaches
from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42) model = DecisionTreeClassifier() model.fit(X_train, y_train) y_pred = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42) model = KNeighborsClassifier(n_neighbors=3) model.fit(X_train, y_train) y_pred = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
Complexity: O(n * d * i) time, O(n * d) space
Time Complexity
Training logistic regression takes time proportional to number of samples (n), features (d), and iterations (i). Prediction is faster, O(n * d).
Space Complexity
Stores the dataset and model parameters, proportional to number of samples and features.
Which Approach is Fastest?
Logistic regression is fast and efficient for small datasets like iris; decision trees and KNN may be slower or prone to overfitting.
| Approach | Time | Space | Best For |
|---|---|---|---|
| Logistic Regression | O(n * d * i) | O(n * d) | Fast, simple linear classification |
| Decision Tree | O(n * d * log n) | O(n * d) | Interpretable models, non-linear data |
| K-Nearest Neighbors | O(n^2 * d) | O(n * d) | Simple, non-parametric, but slower on large data |
max_iter high enough in logistic regression can cause the model to not converge.