Ko-Jen Hsiao, Kevin S. Xu, Jeff Calder Alfred O. Hero III
We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of dissimilarity, e.g. as measured by average k-nearest neighbor Euclidean distance between a test sample and the training samples. However, in many application domains there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple dissimilarity measures can be defined, including non-metric measures, and one can test for anomalies by scalarizing using a non-negative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multi-criteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria and shows superior performance on experiments with synthetic and real data sets.
The Matlab code of for Pareto Depth Analysis can be downloaded here.
Comments and remarks
If you find any bugs or errors, you may report them to the first author of the paper. The email of Ko-Jen Hsiao is firstname.lastname@example.org