MlopsProgramBeginner · 2 min read

Python sklearn Program to Classify Iris Flowers

Use from sklearn.datasets import load_iris to load data, then train a model like LogisticRegression() on the iris data, and predict flower classes with model.predict().

📋

Examples

InputFeatures: [5.1, 3.5, 1.4, 0.2]

OutputPredicted class: 0 (setosa)

InputFeatures: [6.7, 3.0, 5.2, 2.3]

OutputPredicted class: 2 (virginica)

InputFeatures: [5.9, 3.0, 4.2, 1.5]

OutputPredicted class: 1 (versicolor)

🧠

How to Think About It

First, load the iris flower data which has features and labels. Then, split the data into training and testing sets. Train a simple model like logistic regression on the training data. Finally, use the model to predict the class of new iris flower samples.

📐

Algorithm

Load the iris dataset with features and labels

Split the dataset into training and testing parts

Create a logistic regression model

Train the model using the training data

Predict the flower class on the test data

Print the accuracy and example predictions

💻

Code

sklearn

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(f"Predictions: {y_pred[:5]}")

Output

Accuracy: 1.00 Predictions: [1 0 2 1 1]

🔍

Dry Run

Let's trace the program predicting iris classes for the first test samples.

Load iris data

iris.data shape: (150, 4), iris.target shape: (150,)

Split data

Training samples: 112, Testing samples: 38

Train model

Model learns patterns from training features and labels

Predict test data

Model predicts classes for 38 test samples

Calculate accuracy

Accuracy computed by comparing predicted and true labels

Test Sample Index	True Label	Predicted Label
0	1	1
1	0	0
2	2	2
3	1	1
4	1	1

💡

Why This Works

Step 1: Load Data

We use load_iris() to get flower features and their species labels.

Step 2: Train Model

The logistic regression model learns to separate flower types based on features.

Step 3: Predict and Evaluate

The model predicts flower classes on new data and we check accuracy to see how well it learned.

🔄

Alternative Approaches

Decision Tree Classifier

sklearn

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

Decision trees are easy to interpret but can overfit small datasets.

K-Nearest Neighbors

sklearn

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

KNN is simple and effective but slower on large datasets.

⚡

Complexity: O(n * d * i) time, O(n * d) space

Time Complexity

Training logistic regression takes time proportional to number of samples (n), features (d), and iterations (i). Prediction is faster, O(n * d).

Space Complexity

Stores the dataset and model parameters, proportional to number of samples and features.

Which Approach is Fastest?

Logistic regression is fast and efficient for small datasets like iris; decision trees and KNN may be slower or prone to overfitting.

Approach	Time	Space	Best For
Logistic Regression	O(n * d * i)	O(n * d)	Fast, simple linear classification
Decision Tree	O(n * d * log n)	O(n * d)	Interpretable models, non-linear data
K-Nearest Neighbors	O(n^2 * d)	O(n * d)	Simple, non-parametric, but slower on large data

💡

Always split your data into training and testing sets to fairly evaluate your model.

⚠️

Not setting max_iter high enough in logistic regression can cause the model to not converge.