Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

被引：0

作者：

Cai, Tianchi ^{[1
]}

Bao, Shenliao ^{[1
]}

Jiang, Jiyan ^{[2
]}

Zhou, Shiji ^{[2
]}

Zhang, Wenpeng ^{[1
]}

Gu, Lihong ^{[1
]}

Gu, Jinjie ^{[1
]}

Zhang, Guannan ^{[1
]}

机构：

[1] Ant Grp, Hangzhou, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

Recommender System; Reinforcement Learning;

D O I：

10.1145/3539618.3592022

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.

引用

页码：2179 / 2183

页数：5

共 45 条

[1]

Afsar M Mehdi, 2021, ACM COMPUTING SURVEY

[2]

[Anonymous], 2016, INT C MACH LEARN

[3]

Bai XY, 2019, ADV NEUR IN, V32

[4] Optimization Methods for Large-Scale Machine Learning [J].

Bottou, Leon ;

Curtis, Frank E. ;

Nocedal, Jorge .

SIAM REVIEW, 2018, 60 (02) :223-311

[5] Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning [J].

Cai, Tianchi ;

Jiang, Jiyan ;

Zhang, Wenpeng ;

Zhou, Shiji ;

Song, Xierui ;

Yu, Li ;

Gu, Lihong ;

Zeng, Xiaodong ;

Gu, Jinjie ;

Zhang, Guannan .

PROCEEDINGS OF THE SIXTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2023, VOL 1, 2023, :186-194

[6] Off-Policy Actor-critic for Recommender Systems [J].

Chen, Minmin ;

Xu, Can ;

Gatto, Vince ;

Jain, Devanshu ;

Kumar, Aviral ;

Chi, Ed .

PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, :338-349

[7] User Response Models to Improve a REINFORCE Recommender System [J].

Chen, Minmin ;

Chang, Bo ;

Xu, Can ;

Chi, Ed H. .

WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, :121-129

[8] Top-K Off-Policy Correction for a REINFORCE Recommender System [J].

Chen, Minmin ;

Beutel, Alex ;

Covington, Paul ;

Jain, Sagar ;

Belletti, Francois ;

Chi, Ed H. .

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :456-464

[9]

Chen Xiaocong, 2021, ARXIV210903540

[10]

CHEN XY, 2019, PR MACH LEARN RES, V97

← 1 2 3 4 5 →