Calibration of One-Class SVM for MV set estimation

被引:0
作者
Thomas, Albert [1 ,2 ]
Feuillard, Vincent [1 ]
Gramfort, Alexandre [2 ]
机构
[1] Airbus Grp Innovat, 12 Rue Pasteur, F-92150 Suresnes, France
[2] Univ Paris Saclay, Telecom Paris Tech, CNRS, LTCI, F-75013 Paris, France
来源
PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015) | 2015年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The One-Class Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating such regions from high dimensional data. Yet it suffers from practical limitations. When applied to a limited number of samples it can lead to poor performance even when picking the best hyperparameters. Moreover the solution of OCSVM is very sensitive to the selection of hyperparameters which makes it hard to optimize in an unsupervised setting. We present a new approach to estimate MV sets using the OCSVM with a different choice of the parameter controlling the proportion of outliers. The solution function of the OCSVM is learnt on a training set and the desired probability mass is obtained by adjusting the offset on a test set to prevent overfitting. Models learnt on different train/test splits are then aggregated to reduce the variance induced by such random splits. Our approach makes it possible to tune the hyperparameters automatically and obtain nested set estimates. Experimental results show that our approach outperforms the standard OCSVM formulation while suffering less from the curse of dimensionality than kernel density estimates. Results on actual data sets are also presented.
引用
收藏
页码:75 / 83
页数:9
相关论文
共 28 条
[1]   Total error in a plug-in estimator of level sets [J].
Baíllo, A .
STATISTICS & PROBABILITY LETTERS, 2003, 65 (04) :411-417
[2]   Kernel estimation of density level sets [J].
Cadre, B .
JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (04) :999-1023
[3]   Estimation of density level sets with a given probability content [J].
Cadre, Benoit ;
Pelletier, Bruno ;
Pudlo, Pierre .
JOURNAL OF NONPARAMETRIC STATISTICS, 2013, 25 (01) :261-272
[4]   Anomaly Detection: A Survey [J].
Chandola, Varun ;
Banerjee, Arindam ;
Kumar, Vipin .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]  
Clemençon S, 2014, PR MACH LEARN RES, V32, P343
[7]  
Clemen┬u┬║on, 2013, PROC 16 INT C ARTIF, P659
[8]  
Davenport Mark A., 2006, Proceedings of the 2006 IEEE Signal Processing Society Workshop, P301
[9]   GENERALIZED QUANTILE PROCESSES [J].
EINMAHL, JHJ ;
MASON, DM .
ANNALS OF STATISTICS, 1992, 20 (02) :1062-1078
[10]   Applying the Possibilistic c-Means Algorithm in Kernel-Induced Spaces [J].
Filippone, Maurizio ;
Masulli, Francesco ;
Rovetta, Stefano .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :572-584