Reinforcement Recommendation with User Multi-aspect Preference

被引：12

作者：

Chen, Xu ^{[1
]}

Du, Yali ^{[2
]}

Xia, Long ^{[3
]}

Wang, Jun ^{[2
]}

机构：

[1] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing Key Lab Big Data Management & Anal Method, Beijing, Peoples R China

[2] UCL, Dept Comp Sci, London, England

[3] York Univ, Sch Informat Technol, Toronto, ON, Canada

来源：

PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

Recommender system; Reinforcement learning; Multi-objective optimization;

D O I：

10.1145/3442381.3449846

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Formulating recommender system with reinforcement learning (RL) frameworks has attracted increasing attention from both academic and industry communities. While many promising results have been achieved, existing models mostly simulate the environment reward with a unified value, which may hinder the understanding of users' complex preferences and limit the model performance. In this paper, we consider how to model user multi-aspect preferences in the context of RL-based recommender system. More specifically, we base our model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. A major challenge for modeling user multi-aspect preferences lies in the fact that they may contradict with each other. To solve this problem, we introduce Pareto optimization into the DPG framework. We assign each aspect with a tailored critic, and all the critics share the same actor. The Pareto optimization is realized by a gradient-based method, which can be easily integrated into the actor and critic learning process. Based on the designed model, we theoretically analyze its gradient bias in the optimization process, and we design a weight-reuse mechanism to lower the upper bound of this bias, which is shown to be effective for improving the model performance. We conduct extensive experiments based on three real-world datasets to demonstrate our model's superiorities.

引用

页码：425 / 435

页数：11

共 25 条

[1] Bai X., 2019, NEURIPS, P10734
[2] Chen Xinshi, 2018, ARXIV181210613
[3] Neural Collaborative Filtering
He, Xiangnan
Liao, Lizi
Zhang, Hanwang
Nie, Liqiang
Hu, Xia
Chua, Tat-Seng
[J]. PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 173 - 182
[4] Recurrent Neural Networks with Top-k Gains for Session-based Recommendations
Hidasi, Balazs
Karatzoglou, Alexandros
[J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 843 - 852
[5] Khairudin N, 2018, INT CONF INFORM RETR, P91
[6] Lin X, 2019, ADV NEUR IN, V32
[7] Liu F., 2018, ARXIV PREPRINT ARXIV
[8] Beyond the black box in music streaming: the impact of recommendation systems upon artists
O'Dair, Marcus
Fry, Andrew
[J]. POPULAR COMMUNICATION, 2020, 18 (01) : 65 - 77
[9] Rendle S., 2009, P 25 C UNC ART INT M, P452
[10] Sener O., 2018, ADV NEURAL INF PROCE, V31, P525

← 1 2 3 →