k-NN (k-Nearest Neighbours)

What is k-NN?

k-Nearest Neighbours (k-NN) is a machine learning algorithm used for both classification and regression tasks. It is a non-parametric method that is based on the idea of finding k-nearest data points in the training set for a given input data point, and then using those neighbours to predict the output for the given input data point.

Application & Uses of k-NN

In the classification task, k-NN assigns the label of the majority class among the k-nearest neighbours to the input data point. In the regression task, k-NN assigns the average value of the output of the k-nearest neighbours to the input data point.

k-NN is a simple yet powerful algorithm that can be used for a widerange of tasks, including image recognition, speech recognition, and natural language processing. It can be used in both supervised and unsupervised learning tasks. It can also be used in recommendation systems and anomaly detection.

Benefits of k-NN

k-NN has the following benefits:

  • Simplicity: k-NN is a simple algorithm that is easy to understand and implement. It does not require any complex mathematical calculations or assumptions, making it a good choice for beginners and for quick prototyping.
  • Flexibility: k-NN can work with any type of data, including numerical, categorical, and mixed data. It can also be used for both classification and regression tasks.
  • Non-parametric: k-NN is a non-parametric algorithm, which means it does not assume any specific distribution or form for the data. This makes it suitable for data that does not follow a specific distribution or for situations where the underlying distribution is unknown.
  • Interpretable: k-NN produces results that are easy to interpret, as it simply assigns the label or value of the majority of its k-nearest neighbours to the input data point. This makes it useful in situations where interpretability is important, such as in medical or legal applications.
  • Low bias: k-NN has low bias, which means it can fit the data closely without assuming any specific form or structure. This makes it useful in situations where the relationship between the input and output variables is complex or nonlinear.
  • No training required: k-NN does not require any training or model fitting, which makes it computationally efficient and useful for situations where the data is constantly changing or updating.

Example Code for k-NN Algorithm in Python

Below is an example code for the k-nearest neighbors (k-NN) algorithm in Python using the scikit-learn library:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load iris dataset
iris = load_iris()

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)

# Create k-NN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the training set
knn.fit(X_train, y_train)

# Predict the classes of test set
y_pred = knn.predict(X_test)

# Print the accuracy of the classifier
print("Accuracy:", knn.score(X_test, y_test))

In this example, we first load the iris dataset and split it into training and test sets using the train_test_split function. We then create a k-NN classifier with k=3 using the KNeighborsClassifier class, and train the classifier on the training set using the fit method. Finally, we use the trained classifier to predict the classes of the test set using the predict method, and print the accuracy of the classifier using the score method.

Additional Resources