Fast AUC Maximization Learning Machine With Simultaneous Outlier Detection

被引:2
作者
Sun, Yichen [1 ,2 ]
Vong, Chi Man [3 ]
Wang, Shitong [1 ,2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Jiangsu, Peoples R China
[2] Taihu Jiangsu Key Construct Lab IoT Applicat Tech, Wuxi 214122, Jiangsu, Peoples R China
[3] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau, Peoples R China
基金
中国国家自然科学基金;
关键词
Support vector machines; Task analysis; Kernel; Time complexity; Anomaly detection; Upper bound; Training; AUC maximization; imbalance classification; minimum enclosing ball (MEB); outlier detection; ROC CURVE; AREA;
D O I
10.1109/TCYB.2022.3164900
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While AUC maximizing support vector machine (AUCSVM) has been developed to solve imbalanced classification tasks, its huge computational burden will make AUCSVM become impracticable and even computationally forbidden for medium or large-scale imbalanced data. In addition, minority class sometimes means extremely important information for users or is corrupted by noises and/or outliers in practical application scenarios such as medical diagnosis, which actually inspires us to generalize the AUC concept to reflect such importance or upper bound of noises or outliers. In order to address these issues, by means of both the generalized AUC metric and the core vector machine (CVM) technique, a fast AUC maximizing learning machine, called rho-AUCCVM, with simultaneous outlier detection is proposed in this study. rho-AUCCVM has its notorious merits: 1) it indeed shares the CVM's advantage, that is, asymptotically linear time complexity with respect to the total number of sample pairs, together with space complexity independent on the total number of sample pairs and 2) it can automatically determine the importance of the minority class (assuming no noise) or the upper bound of noises or outliers. Extensive experimental results about benchmarking imbalanced datasets verify the above advantages of rho-AUCCVM.
引用
收藏
页码:6843 / 6857
页数:15
相关论文
共 48 条
  • [21] SMO algorithm for least-squares SVM formulations
    Keerthi, SS
    Shevade, SK
    [J]. NEURAL COMPUTATION, 2003, 15 (02) : 487 - 507
  • [22] Kumar P., 2003, ACM J EXP ALGORITHMI, V8
  • [23] Liu M., 2018, INT C MACHINE LEARNI, P3195
  • [24] On Linear Combinations of Dichotomizers for Maximizing the Area Under the ROC Curve
    Marrocco, Claudio
    Molinara, Mario
    Tortorella, Francesco
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (03): : 610 - 620
  • [25] Matey J. R., 2015, BIOM THEOR APPL SYST, P1
  • [26] Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems
    Ng, Wing W. Y.
    Hu, Junjie
    Yeung, Daniel S.
    Yin, Shaohua
    Roli, Fabio
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (11) : 2402 - 2412
  • [27] Provost F., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P445
  • [28] Fast Graph-Based Relaxed Clustering for Large Data Sets Using Minimal Enclosing Ball
    Qian, Pengjiang
    Chung, Fu-Lai
    Wang, Shitong
    Deng, Zhaohong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (03): : 672 - 687
  • [29] A Robust AUC Maximization Framework With Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification
    Ren, Ke
    Yang, Haichuan
    Zhao, Yu
    Chen, Wu
    Xue, Mingshan
    Miao, Hongyu
    Huang, Shuai
    Liu, Ji
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (10) : 3072 - 3083
  • [30] Estimating the support of a high-dimensional distribution
    Schölkopf, B
    Platt, JC
    Shawe-Taylor, J
    Smola, AJ
    Williamson, RC
    [J]. NEURAL COMPUTATION, 2001, 13 (07) : 1443 - 1471