What is Out-of-Distribution Detection?
Out-of-distribution (OOD) detection is the process of identifying data samples that belong to a different distribution than the one used to train a machine learning model. OOD detection is essential for ensuring the robustness and reliability of a model, as models can produce unreliable predictions or high-confidence errors when faced with data not seen during training.
Example of Out-of-Distribution Detection:
Suppose we have trained a deep neural network for handwritten digit recognition using the MNIST dataset, and we want to detect if the model is presented with images of letters instead of digits. In this case, the images of letters are OOD samples.
Here’s a Python code example using the
import numpy as np import matplotlib.pyplot as plt from keras.datasets import mnist from keras.datasets import cifar10 from pyod.models.ocsvm import OCSVM # Load MNIST and CIFAR-10 datasets (x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = mnist.load_data() (_, _), (x_test_cifar, y_test_cifar) = cifar10.load_data() # Preprocess the data x_train_mnist = x_train_mnist.reshape(-1, 28*28) / 255.0 x_test_mnist = x_test_mnist.reshape(-1, 28*28) / 255.0 x_test_cifar = x_test_cifar.mean(axis=3).reshape(-1, 32*32) / 255.0 # Train a One-Class SVM on the MNIST training data ocsvm = OCSVM() ocsvm.fit(x_train_mnist) # Test the model on MNIST and CIFAR-10 test data mnist_scores = ocsvm.decision_function(x_test_mnist) cifar_scores = ocsvm.decision_function(x_test_cifar) # Plot the decision scores plt.hist(mnist_scores, bins='auto', alpha=0.7, label='MNIST (in-distribution)') plt.hist(cifar_scores, bins='auto', alpha=0.7, label='CIFAR-10 (out-of-distribution)') plt.xlabel('Decision Score') plt.ylabel('Frequency') plt.legend() plt.show()
In this example, we train a One-Class SVM on the MNIST training data and test it on both MNIST and CIFAR-10 test data. The decision scores for the in-distribution (MNIST) and out-of-distribution (CIFAR-10) samples are plotted in a histogram, showing the separation between the two distributions.