Abstract
Describing uncertainty is one of the major issues in modern deep learning. Artificial Intelligence models could be used with greater confidence by having solid methods for identifying and quantifying uncertainty. This article proposes two alternative methods for outlier detection in Bayesian Neural Networks used for classification tasks. This is done by looking for unusual pre-activation neuron values in the last layer of a Bayesian Neural Network.
The proposed methods are compared to a baseline method for outlier detection, Predictive Entropy, on three datasets: a simulated dataset, the MNIST dataset, and the Breast Cancer Wisconsin dataset. In addition, we introduce a method for separating In-Between and Out-Of-Distribution outliers.
The results indicate that the proposed methods' performance depends on the dataset type. In the case of a simple simulated dataset, we see that the proposed methods outperform the baseline method. The proposed method also outperforms the baseline method on the Breast Cancer Wisconsin dataset. However, when tested on the MNIST image dataset, we observed that the baseline is better than the proposed method. As expected, for all three datasets, an OR-combination of the two methods gives slightly better POWER score than the two methods separately, but this comes at the price of also obtaining a higher FDR score.
The proposed methods are compared to a baseline method for outlier detection, Predictive Entropy, on three datasets: a simulated dataset, the MNIST dataset, and the Breast Cancer Wisconsin dataset. In addition, we introduce a method for separating In-Between and Out-Of-Distribution outliers.
The results indicate that the proposed methods' performance depends on the dataset type. In the case of a simple simulated dataset, we see that the proposed methods outperform the baseline method. The proposed method also outperforms the baseline method on the Breast Cancer Wisconsin dataset. However, when tested on the MNIST image dataset, we observed that the baseline is better than the proposed method. As expected, for all three datasets, an OR-combination of the two methods gives slightly better POWER score than the two methods separately, but this comes at the price of also obtaining a higher FDR score.