Sequential Recommendation via Stochastic Self-Attention

被引:112
作者
Fan, Ziwei [1 ,5 ]
Liu, Zhiwei [1 ]
Wang, Yu [1 ]
Wang, Alice [2 ]
Nazari, Zahra [2 ]
Zheng, Lei [3 ]
Peng, Hao [4 ]
Yu, Philip S. [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA
[2] Spotify, New York, NY USA
[3] Pinterest Inc, Chicago, IL USA
[4] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[5] Spotify Res, New York, NY USA
来源
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22) | 2022年
关键词
Sequential Recommendation; Transformer; Self-Attention; Uncertainty;
D O I
10.1145/3485447.3512077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sequential recommendation models the dynamics of a user's previous behaviors in order to forecast the next item, and has drawn a lot of attention. Transformer-based approaches, which embed items as vectors and use dot-product self-attention to measure the relationship between items, demonstrate superior capabilities among existing sequential methods. However, users' real-world sequential behaviors are uncertain rather than deterministic, posing a significant challenge to present techniques. We further suggest that dot-product-based approaches cannot fully capture collaborative transitivity, which can be derived in item-item transitions inside sequences and is beneficial for cold start items. We further argue that BPR loss has no constraint on positive and sampled negative items, which misleads the optimization. We propose a novel STOchastic Self-Attention (STOSA) to overcome these issues. STOSA, in particular, embeds each item as a stochastic Gaussian distribution, the covariance of which encodes the uncertainty. We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences, which effectively incorporates uncertainty into model training. Wasserstein attentions also enlighten the collaborative transitivity learning as it satisfies triangle inequality. Moreover, we introduce a novel regularization term to the ranking loss, which assures the dissimilarity between positive and the negative items. Extensive experiments on five real-world benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art baselines, especially on cold start items. The code is available in https://github.com/zfan20/STOSA.
引用
收藏
页码:2036 / 2047
页数:12
相关论文
共 57 条
[51]   Graph Convolutional Neural Networks for Web-Scale Recommender Systems [J].
Ying, Rex ;
He, Ruining ;
Chen, Kaifeng ;
Eksombatchai, Pong ;
Hamilton, William L. ;
Leskovec, Jure .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :974-983
[52]  
Zhang Tao, 2021, P 2021 C EMP METH NA, P5441
[53]  
Zhang Tao, 2020, ARXIV200401267
[54]   Gated Spectral Units: Modeling Co-evolving Patterns for Sequential Recommendation [J].
Zheng, Lei ;
Fan, Ziwei ;
Lu, Chun-Ta ;
Zhang, Jiawei ;
Yu, Philip S. .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :1077-1080
[55]   Deep Distribution Network: Addressing the Data Sparsity Issue for Top-N Recommendation [J].
Zheng, Lei ;
Li, Chaozhuo ;
Lu, Chun-Ta ;
Zhang, Jiawei ;
Yu, Philip S. .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :1081-1084
[56]   PURE: Positive-Unlabeled Recommendation with Generative Adversarial Network [J].
Zhou, Yao ;
Xu, Jianpeng ;
Wu, Jun ;
Taghavi, Zeinab ;
Korpeoglu, Evren ;
Achan, Kannan ;
He, Jingrui .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :2409-2419
[57]   Deep Variational Network Embedding in Wasserstein Space [J].
Zhu, Dingyuan ;
Cui, Peng ;
Wang, Daixin ;
Zhu, Wenwu .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :2827-2836