0
0
MlopsProgramBeginner · 2 min read

Python sklearn Program to Classify Iris Flowers

Use from sklearn.datasets import load_iris to load data, then train a model like LogisticRegression() on the iris data, and predict flower classes with model.predict().
📋

Examples

InputFeatures: [5.1, 3.5, 1.4, 0.2]
OutputPredicted class: 0 (setosa)
InputFeatures: [6.7, 3.0, 5.2, 2.3]
OutputPredicted class: 2 (virginica)
InputFeatures: [5.9, 3.0, 4.2, 1.5]
OutputPredicted class: 1 (versicolor)
🧠

How to Think About It

First, load the iris flower data which has features and labels. Then, split the data into training and testing sets. Train a simple model like logistic regression on the training data. Finally, use the model to predict the class of new iris flower samples.
📐

Algorithm

1
Load the iris dataset with features and labels
2
Split the dataset into training and testing parts
3
Create a logistic regression model
4
Train the model using the training data
5
Predict the flower class on the test data
6
Print the accuracy and example predictions
💻

Code

sklearn
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(f"Predictions: {y_pred[:5]}")
Output
Accuracy: 1.00 Predictions: [1 0 2 1 1]
🔍

Dry Run

Let's trace the program predicting iris classes for the first test samples.

1

Load iris data

iris.data shape: (150, 4), iris.target shape: (150,)

2

Split data

Training samples: 112, Testing samples: 38

3

Train model

Model learns patterns from training features and labels

4

Predict test data

Model predicts classes for 38 test samples

5

Calculate accuracy

Accuracy computed by comparing predicted and true labels

Test Sample IndexTrue LabelPredicted Label
011
100
222
311
411
💡

Why This Works

Step 1: Load Data

We use load_iris() to get flower features and their species labels.

Step 2: Train Model

The logistic regression model learns to separate flower types based on features.

Step 3: Predict and Evaluate

The model predicts flower classes on new data and we check accuracy to see how well it learned.

🔄

Alternative Approaches

Decision Tree Classifier
sklearn
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
Decision trees are easy to interpret but can overfit small datasets.
K-Nearest Neighbors
sklearn
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
KNN is simple and effective but slower on large datasets.

Complexity: O(n * d * i) time, O(n * d) space

Time Complexity

Training logistic regression takes time proportional to number of samples (n), features (d), and iterations (i). Prediction is faster, O(n * d).

Space Complexity

Stores the dataset and model parameters, proportional to number of samples and features.

Which Approach is Fastest?

Logistic regression is fast and efficient for small datasets like iris; decision trees and KNN may be slower or prone to overfitting.

ApproachTimeSpaceBest For
Logistic RegressionO(n * d * i)O(n * d)Fast, simple linear classification
Decision TreeO(n * d * log n)O(n * d)Interpretable models, non-linear data
K-Nearest NeighborsO(n^2 * d)O(n * d)Simple, non-parametric, but slower on large data
💡
Always split your data into training and testing sets to fairly evaluate your model.
⚠️
Not setting max_iter high enough in logistic regression can cause the model to not converge.