K-Nearest Neighbors helps us guess the label of new data by looking at the closest examples we already know.
K-Nearest Neighbors (KNN) in ML Python
from sklearn.neighbors import KNeighborsClassifier model = KNeighborsClassifier(n_neighbors=k) model.fit(X_train, y_train) predictions = model.predict(X_test)
n_neighbors is the number of closest points to look at.
You must fit the model with training data before predicting.
model = KNeighborsClassifier(n_neighbors=3)model = KNeighborsClassifier(n_neighbors=5, weights='distance')
model.fit(X_train, y_train) predictions = model.predict(X_test)
This program loads iris flower data, splits it, trains a KNN model with 3 neighbors, predicts flower types on test data, and prints predictions and accuracy.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score # Load example data iris = load_iris() X, y = iris.data, iris.target # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create KNN model with 3 neighbors model = KNeighborsClassifier(n_neighbors=3) # Train the model model.fit(X_train, y_train) # Predict on test data predictions = model.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, predictions) print(f"Predictions: {predictions}") print(f"Accuracy: {accuracy:.2f}")
KNN works best with small to medium datasets because it compares new points to all training points.
Choosing the right number of neighbors (k) is important: too small can be noisy, too large can smooth out details.
Features should be scaled (like using normalization) for KNN to work well because it uses distance.
KNN predicts labels by looking at the closest known examples.
It's easy to use and needs no training time but can be slow with big data.
Choosing the number of neighbors and scaling data are key to good results.