0
0
ML Pythonprogramming~5 mins

K-Nearest Neighbors (KNN) in ML Python

Choose your learning style9 modes available
Introduction

K-Nearest Neighbors helps us guess the label of new data by looking at the closest examples we already know.

When you want to classify emails as spam or not spam based on similar past emails.
When you want to predict if a fruit is an apple or orange by comparing its features to known fruits.
When you want to recommend movies by finding users with similar tastes.
When you want to quickly group new customers based on their shopping habits.
When you want a simple model that doesn't need training time.
Syntax
ML Python
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=k)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

n_neighbors is the number of closest points to look at.

You must fit the model with training data before predicting.

Examples
Use 3 neighbors to decide the class.
ML Python
model = KNeighborsClassifier(n_neighbors=3)
Neighbors closer to the point have more influence.
ML Python
model = KNeighborsClassifier(n_neighbors=5, weights='distance')
Train the model and then predict labels for new data.
ML Python
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Sample Program

This program loads iris flower data, splits it, trains a KNN model with 3 neighbors, predicts flower types on test data, and prints predictions and accuracy.

ML Python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load example data
iris = load_iris()
X, y = iris.data, iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create KNN model with 3 neighbors
model = KNeighborsClassifier(n_neighbors=3)

# Train the model
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)

print(f"Predictions: {predictions}")
print(f"Accuracy: {accuracy:.2f}")
OutputSuccess
Important Notes

KNN works best with small to medium datasets because it compares new points to all training points.

Choosing the right number of neighbors (k) is important: too small can be noisy, too large can smooth out details.

Features should be scaled (like using normalization) for KNN to work well because it uses distance.

Summary

KNN predicts labels by looking at the closest known examples.

It's easy to use and needs no training time but can be slow with big data.

Choosing the number of neighbors and scaling data are key to good results.