Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

被引:16
作者
Xiao, Shitao [1 ]
Liu, Zheng [2 ]
Shao, Yingxia [1 ]
Di, Tao [3 ]
Middha, Bhuvan
Wu, Fangzhao [2 ]
Xie, Xing [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Microsoft, Redmond, WA USA
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
基金
中国国家自然科学基金;
关键词
News Recommendation; Pretrained Language Models; Training Framework; Efficiency and Effectiveness;
D O I
10.1145/3534678.3539120
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
News recommendation calls for deep insights of news articles' underlying semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa, may substantially contribute to the recommendation quality. However, it's extremely challenging to have news recommenders trained together with such big models: the learning of news recommenders requires intensive news encoding operations, whose cost is prohibitive if PLMs are used as the news encoder. In this paper, we propose a novel framework, SpeedyFeed, which efficiently trains PLMs-based news recommenders of superior quality. SpeedyFeed is highlighted for its light-weight encoding pipeline, which gives rise to three major advantages. Firstly, it makes the intermediate results fully reusable for the training workflow, which removes most of the repetitive but redundant encoding operations. Secondly, it improves the data efficiency of the training workflow, where non-informative data can be eliminated from encoding. Thirdly, it further saves the cost by leveraging simplified news encoding and compact news representation. SpeedyFeed leads to more than 100x acceleration of the training process, which enables big models to be trained efficiently and effectively over massive user data. The well-trained PLMs-based model significantly outperforms the state-of-the-art news recommenders in comprehensive offline experiments. It is applied to Microsoft News1 to empower the training of large-scale production models, which demonstrate highly competitive online performances. SpeedyFeed is also a model-agnostic framework, thus being potentially applicable to a wide spectrum of content-based recommender systems. We've made the source code2 open to the public so as to facilitate research and applications in related areas.
引用
收藏
页码:4215 / 4225
页数:11
相关论文
共 26 条
  • [1] An MX, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P336
  • [2] Deep Neural Networks for YouTube Recommendations
    Covington, Paul
    Adams, Jay
    Sargin, Emre
    [J]. PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 191 - 198
  • [3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [4] Gao J, 2020, INT C MACHINE LEARNI, P642
  • [5] Gao TY, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P6894
  • [6] Graph Enhanced Representation Learning for News Recommendation
    Ge, Suyu
    Wu, Chuhan
    Wu, Fangzhao
    Qi, Tao
    Huang, Yongfeng
    [J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2863 - 2869
  • [7] Gupta V., 2020, ARXIV201008899
  • [8] Bi-Labeled LDA: Inferring Interest Tags for Non-famous Users in Social Network
    He, Jun
    Liu, Hongyan
    Zheng, Yiqing
    Tang, Shu
    He, Wei
    Du, Xiaoyong
    [J]. DATA SCIENCE AND ENGINEERING, 2020, 5 (01) : 27 - 47
  • [9] Hu LM, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4255
  • [10] Li L., 2010, P 19 INT C WORLD WID, P661, DOI DOI 10.1145/1772690.1772758