PERSIA: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters

被引:17
作者
Lian, Xiangru [1 ]
Yuan, Binhang [2 ]
Zhu, Xuefeng [3 ]
Wang, Yulong [3 ]
He, Yongjun [2 ]
Wu, Honghuan [3 ]
Sun, Lei [3 ]
Lyu, Haodong [3 ]
Liu, Chengjun [3 ]
Dong, Xing [3 ]
Liao, Yiqiao [3 ]
Luo, Mingnan [3 ]
Zhang, Congfei [3 ]
Xie, Jingru [3 ]
Li, Haonan [3 ]
Chen, Lei [3 ]
Huang, Renjie [3 ]
Lin, Jianying [3 ]
Shu, Chengchun [3 ]
Qiu, Xuezhong [3 ]
Liu, Zhishan [3 ]
Kong, Dongying [3 ]
Yuan, Lei [3 ]
Yu, Hai [3 ]
Yang, Sen [3 ]
Zhang, Ce [2 ]
Liu, Ji [1 ]
机构
[1] Kwai Inc, Palo Alto, CA 94306 USA
[2] ETH, Zurich, Switzerland
[3] Kuaishou, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
关键词
Recommendation system;
D O I
10.1145/3534678.3539070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years have witnessed an exponential growth of model scale in deep learning-based recommender systems-from Google's 2016 model with 1 billion parameters to the latest Facebook's model with 12 trillion parameters. Significant quality boost has come with each jump of the model capacity, which makes us believe the era of 100 trillion parameters is around the corner. However, the training of such models is challenging even within industrial scale data centers. We resolve this challenge by careful co-design of both optimization algorithm and distributed system architecture. Specifically, to ensure both the training efficiency and the training accuracy, we design a novel hybrid training algorithm, where the embedding layer and the dense neural network are handled by different synchronization mechanisms; then we build a system called Persia (short for parallel recommendation training system with hybrid acceleration) to support this hybrid training algorithm. Both theoretical demonstrations and empirical studies with up to 100 trillion parameters have been conducted to justify the system design and implementation of Persia. We make Persia publicly available (at github.com/PersiaML/Persia) so that anyone can easily train a recommender model at the scale of 100 trillion parameters.
引用
收藏
页码:3288 / 3298
页数:11
相关论文
共 31 条
[1]  
[Anonymous], 2013, NEUR INF PROC SYST C, DOI DOI 10.1111/PCMR.12096
[2]  
[Anonymous], 2018, CoRR abs/1802.05799
[3]   Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention [J].
Chen, Jingyuan ;
Zhang, Hanwang ;
He, Xiangnan ;
Nie, Liqiang ;
Liu, Wei ;
Chua, Tat-Seng .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :335-344
[4]  
Cheng H. T., 2016, P 1 WORKSH DEEP LEAR, P7
[5]  
Chenoweth JM, 2016, FLA MUS NAT HIST-RIP, P1
[6]   Deep Neural Networks for YouTube Recommendations [J].
Covington, Paul ;
Adams, Jay ;
Sargin, Emre .
PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, :191-198
[7]  
Davidson J, 2010, P 4 ACM C REC SYST, P293, DOI DOI 10.1145/1864708.1864770
[8]  
Eksombatchai C, 2018, WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), P1775
[9]  
Gan Shaoduo, 2021, ARXIV210701499
[10]   The Netflix Recommender System: Algorithms, Business Value, and Innovation [J].
Gomez-Uribe, Carlos A. ;
Hunt, Neil .
ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2016, 6 (04)