Improving Training Stability for Multitask Ranking Models in Recommender Systems

被引：4

作者：

Tang, Jiaxi ^{[1
]}

Drori, Yoel ^{[2
]}

Chang, Daryl ^{[3
]}

Sathiamoorthy, Maheswaran ^{[1
]}

Gilmer, Justin ^{[1
]}

Wei, Li ^{[3
]}

Yi, Xinyang ^{[1
]}

Hong, Lichan ^{[1
]}

Chi, Ed H. ^{[1
]}

机构：

[1] Google Deepmind, Mountain View, CA 94043 USA

[2] Google Res, Tel Aviv, Israel

[3] Google Inc, Mountain View, CA USA

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

关键词：

Recommender System; Optimization; Training Stability;

D O I：

10.1145/3580305.3599846

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss divergence, which can make the model unusable, waste significant resources and block model developments. In this paper, we share our findings and best practices we learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations. We show some properties of the model that lead to unstable training and conjecture on the causes. Furthermore, based on our observations of training dynamics near the point of training instability, we hypothesize why existing solutions would fail, and propose a new algorithm to mitigate the limitations of existing solutions. Our experiments on YouTube production dataset show the proposed algorithm can significantly improve training stability while not compromising convergence, comparing with several commonly used baseline methods. We open source our implementation at https://github.com/tensorflow/recommenders/ tree/main/tensorflow_recommenders/experimental/optimizers/clippy_adagrad.py.

引用

页码：4882 / 4893

页数：12

共 36 条

[1] Abadi Martin, 2016, arXiv
[2] Anil Rohan, 2022, ARXIV220905310
[3] [Anonymous], 2018, ACM SIGKDD INT C KNO, DOI DOI 10.1145/3219819.3220007
[4] [Anonymous], 2018, INT C WEB SEARCH DAT, DOI DOI 10.1145/3159652.3159727
[5] Ardalani Newsha, 2022, ARXIV220808489
[6] Ba Jimmy Lei, 2016, ARXIV160706450
[7] Bengio Y, 2013, INT CONF ACOUST SPEE, P8624, DOI 10.1109/ICASSP.2013.6639349
[8] Brock Andrew, 2021, P MACHINE LEARNING R, V139
[9] Multitask learning
Caruana, R
[J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75
[10] Chen Xiangning, 2022, 1 C AUT MACH LEARN L

← 1 2 3 4 →