State representation modeling for deep reinforcement learning based recommendation

被引：44

作者：

Liu, Feng ^{[1
]}

Tang, Ruiming ^{[2
]}

Li, Xutao ^{[1
]}

Zhang, Weinan ^{[3
]}

Ye, Yunming ^{[1
]}

Chen, Haokun ^{[3
]}

Guo, Huifeng ^{[2
]}

Zhang, Yuzhou ^{[2
]}

He, Xiuqiang ^{[2
]}

机构：

[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen Key Lab Internet Informat Collaborat, Shenzhen 518055, Peoples R China

[2] Noahs Ark Lab, Huawei, Peoples R China

[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2020年 / 205卷

基金：

国家重点研发计划;

关键词：

State representation modeling; Deep reinforcement learning; Recommendation;

D O I：

10.1016/j.knosys.2020.106170

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning techniques have recently been introduced to interactive recommender systems to capture the dynamic patterns of user behavior during the interaction with recommender systems and perform planning to optimize long-term performance. Most existing research work focuses on designing policy and learning algorithms of the recommender agent but seldom cares about the state representation of the environment, which is indeed essential for the recommendation decision making. In this paper, we first formulate the interactive recommender system problem with a deep reinforcement learning recommendation framework. Within this framework, we then carefully design four state representation schemes for learning the recommendation policy. Inspired by recent advances in feature interaction modeling in user response prediction, we discover that explicitly modeling user-item interactions in state representation can largely help the recommendation policy perform effective reinforcement learning. Extensive experiments on four real-world datasets are conducted under both the offline and simulated online evaluation settings. The experimental results demonstrate the proposed state representation schemes lead to better performance over the state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.

引用

页数：12

共 53 条

[1]

[Anonymous], 2017, ATTENTIONAL FACTORIZ

[2]

[Anonymous], 2013, Adv. Differ. Equ., DOI DOI 10.1038/LABINVEST.2012.144.HTTPS://D0I.0RG/10.1038/LABINVEST.2012.144

[3]

[Anonymous], 2017, DeepFM: A FactorizationMachine Based Neural Network for CTR Prediction, DOI [10.24963/ijcai.2017/239, DOI 10.24963/IJCAI.2017/239]

[4]

[Anonymous], 2014, ICML ICML 14

[5]

[Anonymous], SHANGHAI CONSTR SCI, DOI DOI 10.1088/0266-5611/26/10/105016

[6]

[Anonymous], 1998, Introduction to Reinforcement Learning, DOI DOI 10.5555/551283

[7]

[Anonymous], 2018, AAAI

[8]

[Anonymous], 2016, PREPRINT

[9]

Bai Xueying, 2019, P 33 INT C NEUR INF, V32, P10735

[10] Real-Time Bidding by Reinforcement Learning in Display Advertising [J].

Cai, Han ;

Ren, Kan ;

Zhang, Weinan ;

Malialis, Kleanthis ;

Wang, Jun ;

Yu, Yong ;

Guo, Defeng .

WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, :661-670

← 1 2 3 4 5 6 →