Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation

被引:111
作者
Pereira Moreira, Gabriel de Souza [1 ]
Rabhi, Sara [2 ]
Lee, Jeong Min [3 ,4 ]
Ak, Ronay [4 ]
Oldridge, Even [5 ]
机构
[1] NVIDIA, Sao Paulo, Brazil
[2] NVIDIA, Toronto, ON, Canada
[3] Facebook AI, Menlo Pk, CA USA
[4] NVIDIA, Orlando, FL USA
[5] NVIDIA, Vancouver, BC, Canada
来源
15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021) | 2021年
关键词
D O I
10.1145/3460231.3474255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Much of the recent progress in sequential and session-based recommendation has been driven by improvements in model architecture and pretraining techniques originating in the field of Natural Language Processing. Transformer architectures in particular have facilitated building higher-capacity models and provided data augmentation and training techniques which demonstrably improve the effectiveness of sequential recommendation. But with a thousandfold more research going on in NLP, the application of transformers for recommendation understandably lags behind. To remedy this we introduce Transformers4Rec, an open-source library built upon HuggingFace's Transformers library with a similar goal of opening up the advances of NLP based Transformers to the recommender system community and making these advancements immediately accessible for the tasks of sequential and session-based recommendation. Like its core dependency, Transformers4Rec is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. In order to demonstrate the usefulness of the library and the applicability of Transformer architectures in next-click prediction for user sessions, where sequence lengths are much shorter than those commonly found in NLP, we have leveraged Transformers4Rec to win two recent session-based recommendation competitions. In addition, we present in this paper the first comprehensive empirical analysis comparing many Transformer architectures and training approaches for the task of session-based recommendation. We demonstrate that the best Transformer architectures have superior performance across two e-commerce datasets while performing similarly to the baselines on two news datasets. We further evaluate in isolation the effectiveness of the different training techniques used in causal language modeling, masked language modeling, permutation language modeling and replacement token detection for a single Transformer architecture, XLNet. We establish that training XLNet with replacement token detection performs well across all datasets. Finally, we explore techniques to include side information such as item and user context features in order to establish best practices and show that the inclusion of side information uniformly improves recommendation performance.
引用
收藏
页码:143 / 153
页数:11
相关论文
共 67 条
[1]  
[Anonymous], 2021, TRANSFORMERS4REC PAP
[2]  
[Anonymous], 2019, P 27 ACM INT C MULT, DOI DOI 10.1145/3343031.3356051
[3]  
Ba J.L., 2016, stat, VVolume 29, P3617, DOI 10.48550/arXiv.1607.06450
[4]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[5]  
Balazs H, 2015, ARXIV151106939
[6]   Latent Cross: Making Use of Context in Recurrent Recommender Systems [J].
Beutel, Alex ;
Covington, Paul ;
Jain, Sagar ;
Xu, Can ;
Li, Jia ;
Gatto, Vince ;
Chi, Ed H. .
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, :46-54
[7]   Behavior Sequence Transformer for E-commerce Recommendation in Alibaba [J].
Chen, Qiwei ;
Zhao, Huan ;
Li, Wei ;
Huang, Pipei ;
Ou, Wenwu .
1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,
[8]  
Cho K., 2014, ARXIV14061078, DOI [DOI 10.3115/V1/D14-1179, 10.3115/v1/D14-1179]
[9]  
Clark Kevin, 2020, ELECTRA: Pretraining text encoders as discriminators rather than generators, DOI [DOI 10.48550/ARXIV.2003.10555, 10.48550/arXiv.2003.10555]
[10]  
Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978