Automatic optimization of outlier detection ensembles using a limited number of outlier examples

被引:0
作者
Niko Reunanen
Tomi Räty
Timo Lintonen
机构
[1] Hellon Oy,
[2] VTT Technical Research Centre of Finland,undefined
来源
International Journal of Data Science and Analytics | 2020年 / 10卷
关键词
Bagging; Outlier detection; Outlier detection ensemble; Semi-supervised outlier detection;
D O I
暂无
中图分类号
学科分类号
摘要
In data analysis, outliers are deviating and unexpected observations. Outlier detection is important, because outliers can contain critical and interesting information. We propose an approach for optimizing outlier detection ensembles using a limited number of outlier examples. In our work, a limited number of outlier examples are defined as from 1 to 10% of the available outliers. The optimized outlier detection ensembles consist of outlier detection algorithms, which provide an outlier score and utilize adjustable parameters. The automatic optimization determines the parameter values, which enhance the discrimination of inliers and outliers. This increases the efficiency of the outlier detection. Outliers are rare by definition, which makes the optimization with a few examples beneficial. Obtaining examples of outliers can be prohibitively challenging, and the outlier examples should be used efficiently.
引用
收藏
页码:377 / 394
页数:17
相关论文
共 96 条
[1]  
Aggarwal C(2013)Outlier ensembles: position paper SIGKDD Explor. Newsl. 14 49-58
[2]  
Aggarwal C(2015)Theoretical foundations and algorithms for outlier ensembles SIGKDD Explor. Newsl. 17 24-47
[3]  
Sathe S(2009)Reconstruction-based contribution for process monitoring Automatica 45 1593-1600
[4]  
Alcala C(2009)Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets ACM Trans. Knowl. Discov. Data 3 1-57
[5]  
Qin S(2014)Harnessing the power of gpus to speed up feature selection for outlier detection J. Comput. Sci. Technol. 29 408-422
[6]  
Angiulli F(2003)Strategies for learning in class imbalance problems Pattern Recognit. 36 849-851
[7]  
Fassetti F(2012)Random search for hyper-parameter optimization J. Mach. Learn. Res. 13 281-305
[8]  
Azmandian F(2012)A probabilistic combination approach to improve outlier detection Int. Conf. Tools Artif. Intell. (ICTAI) 1 666-673
[9]  
Yilmazer A(1996)Bagging predictors Mach. Learn. 24 123-140
[10]  
Dy JG(2009)Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 39 101-113