StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

被引:16
作者
Wang, Jiachun [1 ,4 ]
Yuan, Fajie [2 ,3 ]
Chen, Jian [1 ]
Wu, Qingyao [1 ]
Yang, Min [4 ]
Sun, Yang [4 ]
Zhang, Guoxiao [3 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Guangdong, Peoples R China
[2] Westlake Univ, Hangzhou, Zhejiang, Peoples R China
[3] Tencent, Shenzhen, Guangdong, Peoples R China
[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Guangdong, Peoples R China
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
基金
中国国家自然科学基金;
关键词
Recommender systems; Knowledge Transfer; Training acceleration;
D O I
10.1145/3404835.3462890
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning has brought great progress for the sequential recommendation (SR) tasks. With advanced network architectures, sequential recommender models can be stacked with many hidden layers, e.g., up to 100 layers on real-world recommendation datasets. Training such a deep network is difficult because it can be computationally very expensive and takes much longer time, especially in situations where there are tens of billions of user-item interactions. To deal with such a challenge, we present StackRec, a simple, yet very effective and efficient training framework for deep SR models by iterative layer stacking. Specifically, we first offer an important insight that hidden layers/blocks in a well-trained deep SR model have very similar distributions. Enlightened by this, we propose the stacking operation on the pre-trained layers/blocks to transfer knowledge from a shallower model to a deep model, then we perform iterative stacking so as to yield a much deeper but easier-to-train SR model. We validate the performance of StackRec by instantiating it with four state-of-the-art SR models in three practical scenarios with real-world datasets. Extensive experiments show that StackRec achieves not only comparable performance, but also substantial acceleration in training time, compared to SR models that are trained from scratch. Codes are available at https://github.com/wangjiachun0426/StackRec.
引用
收藏
页码:357 / 366
页数:10
相关论文
共 32 条
[1]  
Alstrom T.S., 2020, ARXIV200603089
[2]   Latent Cross: Making Use of Context in Recurrent Recommender Systems [J].
Beutel, Alex ;
Covington, Paul ;
Jain, Sagar ;
Xu, Can ;
Li, Jia ;
Gatto, Vince ;
Chi, Ed H. .
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, :46-54
[3]  
Cai Han, 2017, ARXIV PREPRINT ARXIV
[4]  
Chen Lei, 2021, USER ADAPTIVE LAYER
[5]   Equivalence among Stochastic Logic Circuits and its Application [J].
Chen, Te-Hsuan ;
Hayes, John P. .
2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
[6]  
Gong LY, 2019, PR MACH LEARN RES, V97
[7]   Neural Factorization Machines for Sparse Predictive Analytics [J].
He, Xiangnan ;
Chua, Tat-Seng .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :355-364
[8]  
He Xuehai, 2020, arXiv preprint arXiv:2003.10286
[9]  
Hidasi B., P ICLR C SAN JUAN, P1
[10]   Recurrent Neural Networks with Top-k Gains for Session-based Recommendations [J].
Hidasi, Balazs ;
Karatzoglou, Alexandros .
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, :843-852