Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

被引：0

作者：

Qiu, Zi-Hao ^{[1
]}

Hu, Quanqi ^{[2
]}

Zhong, Yongjian ^{[2
]}

Zhang, Lijun ^{[1
]}

Yang, Tianbao ^{[2
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Univ Iowa, Iowa City, IA 52242 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-K variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-K NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warmup and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org/.

引用

页数：31

共 50 条

[31] Distributed Computational Framework for Large-Scale Stochastic Convex Optimization
Rostampour, Vahab
Keviczky, Tamas
ENERGIES, 2021, 14 (01)
[32] LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION
Mokhtari, Aryan
Koppel, Alec
Scutari, Gesualdo
Ribeiro, Alejandro
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4701 - 4705
[33] New scenario decomposition method for large-scale stochastic optimization
Mulvey, John M.
Ruszczynski, Andrzej
Operations Research, 1995, 43 (03):
[34] Provable Super-Convergence With a Large Cyclical Learning Rate
Oymak, Samet
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 1645 - 1649
[35] A NEW SCENARIO DECOMPOSITION METHOD FOR LARGE-SCALE STOCHASTIC OPTIMIZATION
MULVEY, JM
RUSZCZYNSKI, AJ
OPERATIONS RESEARCH, 1995, 43 (03) : 477 - 490
[36] A STOCHASTIC QUASI-NEWTON METHOD FOR LARGE-SCALE OPTIMIZATION
Byrd, R. H.
Hansen, S. L.
Nocedal, Jorge
Singer, Y.
SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (02) : 1008 - 1031
[37] Deep learning for the large-scale cancer data analysis
Tsuji, Shingo
Aburatani, Hiroyuki
CANCER RESEARCH, 2015, 75 (22)
[38] Deep Reinforcement Learning for Large-Scale Epidemic Control
Libin, Pieter J. K.
Moonens, Arno
Verstraeten, Timothy
Perez-Sanjines, Fabian
Hens, Niel
Lemey, Philippe
Nowe, Ann
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 155 - 170
[39] Deep learning large-scale drug discovery and repurposing
Yu, Min
Li, Weiming
Yu, Yunru
Zhao, Yu
Xiao, Lizhi
Lauschke, Volker M.
Cheng, Yiyu
Zhang, Xingcai
Wang, Yi
NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
[40] HammingMesh: A Network Topology for Large-Scale Deep Learning
Hoefler, Torsten
Bonato, Tommaso
De Sensi, Daniele
Di Girolamo, Salvatore
Li, Shigang
Heddes, Marco
Belk, Jon
Goel, Deepak
Castro, Miguel
Scott, Steve
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,

← 1 2 3 4 5 →