Large-Scale Nonlinear AUC Maximization via Triply Stochastic Gradients

被引：16

作者：

Dang, Zhiyuan ^{[1
,2
]}

Li, Xiang ^{[3
]}

Gu, Bin ^{[4
,5
]}

Deng, Cheng ^{[1
]}

Huang, Heng ^{[5
,6
]}

机构：

[1] Xidian Univ, Sch Elect Engn, Xian 710071, Shaanxi, Peoples R China

[2] JD Digits, Beijing 100176, Peoples R China

[3] Univ Western Ontario, London, ON N6A 3K7, Canada

[4] Mohamed Bin Zayed Univ Artificial Intelligence, Dept Machine Learning, Abu Dhabi, U Arab Emirates

[5] JD Finance Amer Corp, Mountain View, CA 94043 USA

[6] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15260 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 03期

基金：

国家重点研发计划;

关键词：

AUC maximization; random fourier features; kernel methods; NYSTROM METHOD; ONLINE;

D O I：

10.1109/TPAMI.2020.3024987

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning to improve AUC performance for imbalanced data is an important machine learning research problem. Most methods of AUC maximization assume that the model function is linear in the original feature space. However, this assumption is not suitable for nonlinear separable problems. Although there have been some nonlinear methods of AUC maximization, scaling up nonlinear AUC maximization is still an open question. To address this challenging problem, in this paper, we propose a novel large-scale nonlinear AUC maximization method (named as TSAM) based on the triply stochastic gradient descents. Specifically, we first use the random Fourier feature to approximate the kernel function. After that, we use the triply stochastic gradients w.r. t the pairwise loss and random feature to iteratively update the solution. Finally, we prove that TSAM converges to the optimal solution with the rate of O(1/t) after t iterations. Experimental results on a variety of benchmark datasets not only confirm the scalability of TSAM. but also show a significant reduction of computational time compared with existing batch learning algorithms, while retaining the similar generalization performance.

引用

页码：1385 / 1398

页数：14

共 45 条

[21] Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [J].

Gu, Bin ;

Dang, Zhiyuan ;

Li, Xiang ;

Huang, Heng .

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :2483-2493

[22]

Gu B, 2018, AAAI CONF ARTIF INTE, P3085

[23] Accurate on-line ν-support vector learning [J].

Gu, Bin ;

Wang, Jian-Dong ;

Yu, Yue-Cheng ;

Zheng, Guan-Sheng ;

Huang, Yu-Fan ;

Xu, Tao .

NEURAL NETWORKS, 2012, 27 :51-59

[24]

Gultekin S., 2020, IEEE T NEURAL NETW L, P1

[25] THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].

HANLEY, JA ;

MCNEIL, BJ .

RADIOLOGY, 1982, 143 (01) :29-36

[26]

Hu JJ, 2015, AAAI CONF ARTIF INTE, P2666

[27] Online Nonlinear AUC Maximization for Imbalanced Data Sets [J].

Hu, Junjie ;

Yang, Haiqin ;

Lyu, Michael R. ;

King, Irwin ;

So, Anthony Man-Cho .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (04) :882-895

[28]

Kakkar Vishal., 2017, P 2017 SIAM INT C DA, P291

[29]

Khalid Majdi, 2016, Advanced Data Mining and Applications. 12th International Conference, ADMA 2016. Proceedings: LNAI 10086, P35, DOI 10.1007/978-3-319-49586-6_3

[30] Online learning with kernels [J].

Kivinen, J ;

Smola, AJ ;

Williamson, RC .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) :2165-2176

← 1 2 3 4 5 →