Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

被引：5

作者：

Chen, Xiaocong ^{[1
]}

Yao, Lina ^{[1
]}

Wang, Xianzhi ^{[2
]}

Sun, Aixin ^{[3
]}

Sheng, Quan Z. ^{[4
]}

机构：

[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia

[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW 2007, Australia

[3] Nanyang Technol Univ, Singapore 639798, Singapore

[4] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 10期

关键词：

Behavioral sciences; Reinforcement learning; Task analysis; Mathematical models; Generative adversarial networks; Recommender systems; Training; Inverse reinforcement learning; behavioral tendency modeling; adversarial training; generative model;

D O I：

10.1109/TKDE.2022.3186920

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.

引用

页码：9878 / 9889

页数：12

共 66 条

[21] Context-aware system for proactive personalized service based on context history
Hong, Jongyi
Suh, Eui-Ho
Kim, Junyoung
Kim, SuYeon
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7448 - 7457
[22] Hu L, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1858
[23] Gulrajani I, 2017, ADV NEUR IN, V30
[24] Kim H. R., 2003, IUI 03. 2003 International Conference on Intelligent User Interfaces, P101, DOI 10.1145/604045.604064
[25] Kingma DP, 2014, ADV NEUR IN, V27
[26] Konda VR, 2000, ADV NEUR IN, V12, P1008
[27] Kostrikov I., 2019, P INT C LEARN REPR
[28] Learning Behavior Styles with Inverse Reinforcement Learning
Lee, Seong Jae
popovic, Zoran
[J]. ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04):
[29] Lillicrap T.P., 2015, CONTINUOUS CONTROL D
[30] End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding
Liu, Feng
Guo, Huifeng
Li, Xutao
Tang, Ruiming
Ye, Yunming
He, Xiuqiang
[J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 384 - 392

← 1 2 3 4 5 6 7 →