One-pass AUC optimization

被引:21
作者
Gao, Wei [1 ,2 ]
Wang, Lu [1 ,2 ]
Jin, Rong [3 ,4 ]
Zhu, Shenghuo [4 ]
Zhou, Zhi-Hua [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210023, Jiangsu, Peoples R China
[3] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[4] Alibaba Grp, Inst Data Sci & Technol, Seattle, WA USA
基金
美国国家科学基金会;
关键词
AUC; ROC curve; Online learning; Large-scale learning; Least square loss; Random projection; GENERALIZATION BOUNDS; RANKING; AREA;
D O I
10.1016/j.artint.2016.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
AUC is an important performance measure that has been used in diverse tasks, such as class-imbalanced learning, cost-sensitive learning, learning to rank, etc. In this work, we focus on one-pass AUC optimization that requires going through training data only once without having to store the entire training dataset. Conventional online learning algorithms cannot be applied directly to one-pass AUC optimization because AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop a regression-based algorithm which only needs to maintain the first and second order statistics of training data in memory, resulting in a storage requirement independent of the number of training data. To efficiently handle high-dimensional data, we develop two deterministic algorithms that approximate the covariance matrices. We verify, both theoretically and empirically, the effectiveness of the proposed algorithms. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 29
页数:29
相关论文
共 53 条
  • [31] Using AUC and accuracy in evaluating learning algorithms
    Huang, J
    Ling, CX
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (03) : 299 - 310
  • [32] Joachims T., 2006, P 12 ACM SIGKDD INT, P217, DOI [DOI 10.1145/1150402.1150429, 10.1145/1150402.1150429]
  • [33] Kar P., 2013, P MACHINE LEARNING R, P441
  • [34] Kotlowski W., 2011, ICML, P1113
  • [35] Langford J, 2009, J MACH LEARN RES, V10, P777
  • [36] Simple and Deterministic Matrix Sketching
    Liberty, Edo
    [J]. 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 581 - 588
  • [37] Exploratory Undersampling for Class-Imbalance Learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02): : 539 - 550
  • [38] McDiarmid C., 1989, SURVEYS COMBINATORIC, P148, DOI DOI 10.1017/CBO9781107359949.008
  • [39] Menon Aditya Krishna, 2014, P 27 ANN C LEARN THE, V35, P68
  • [40] BASIC PRINCIPLES OF ROC ANALYSIS
    METZ, CE
    [J]. SEMINARS IN NUCLEAR MEDICINE, 1978, 8 (04) : 283 - 298