Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

被引：16

作者：

Xiao, Shitao ^{[1
]}

Liu, Zheng ^{[2
]}

Shao, Yingxia ^{[1
]}

Di, Tao ^{[3
]}

Middha, Bhuvan

Wu, Fangzhao ^{[2
]}

Xie, Xing ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

[3] Microsoft, Redmond, WA USA

来源：

PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

News Recommendation; Pretrained Language Models; Training Framework; Efficiency and Effectiveness;

D O I：

10.1145/3534678.3539120

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

News recommendation calls for deep insights of news articles' underlying semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa, may substantially contribute to the recommendation quality. However, it's extremely challenging to have news recommenders trained together with such big models: the learning of news recommenders requires intensive news encoding operations, whose cost is prohibitive if PLMs are used as the news encoder. In this paper, we propose a novel framework, SpeedyFeed, which efficiently trains PLMs-based news recommenders of superior quality. SpeedyFeed is highlighted for its light-weight encoding pipeline, which gives rise to three major advantages. Firstly, it makes the intermediate results fully reusable for the training workflow, which removes most of the repetitive but redundant encoding operations. Secondly, it improves the data efficiency of the training workflow, where non-informative data can be eliminated from encoding. Thirdly, it further saves the cost by leveraging simplified news encoding and compact news representation. SpeedyFeed leads to more than 100x acceleration of the training process, which enables big models to be trained efficiently and effectively over massive user data. The well-trained PLMs-based model significantly outperforms the state-of-the-art news recommenders in comprehensive offline experiments. It is applied to Microsoft News1 to empower the training of large-scale production models, which demonstrate highly competitive online performances. SpeedyFeed is also a model-agnostic framework, thus being potentially applicable to a wide spectrum of content-based recommender systems. We've made the source code2 open to the public so as to facilitate research and applications in related areas.

引用

页码：4215 / 4225

页数：11

共 26 条

[1] An MX, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P336
[2] Deep Neural Networks for YouTube Recommendations
Covington, Paul
Adams, Jay
Sargin, Emre
[J]. PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 191 - 198
[3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4] Gao J, 2020, INT C MACHINE LEARNI, P642
[5] Gao TY, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P6894
[6] Graph Enhanced Representation Learning for News Recommendation
Ge, Suyu
Wu, Chuhan
Wu, Fangzhao
Qi, Tao
Huang, Yongfeng
[J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2863 - 2869
[7] Gupta V., 2020, ARXIV201008899
[8] Bi-Labeled LDA: Inferring Interest Tags for Non-famous Users in Social Network
He, Jun
Liu, Hongyan
Zheng, Yiqing
Tang, Shu
He, Wei
Du, Xiaoyong
[J]. DATA SCIENCE AND ENGINEERING, 2020, 5 (01) : 27 - 47
[9] Hu LM, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4255
[10] Li L., 2010, P 19 INT C WORLD WID, P661, DOI DOI 10.1145/1772690.1772758

← 1 2 3 →