Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

被引：5

作者：

Chen, Xiaocong ^{[1
]}

Yao, Lina ^{[1
]}

Wang, Xianzhi ^{[2
]}

Sun, Aixin ^{[3
]}

Sheng, Quan Z. ^{[4
]}

机构：

[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia

[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW 2007, Australia

[3] Nanyang Technol Univ, Singapore 639798, Singapore

[4] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 10期

关键词：

Behavioral sciences; Reinforcement learning; Task analysis; Mathematical models; Generative adversarial networks; Recommender systems; Training; Inverse reinforcement learning; behavioral tendency modeling; adversarial training; generative model;

D O I：

10.1109/TKDE.2022.3186920

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.

引用

页码：9878 / 9889

页数：12

共 66 条

[1] Abbeel P., 2004, P 21 INT C MACH LEAR, P1, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
[2] Arjovsky M, 2017, Arxiv, DOI arXiv:1701.07875
[3] Deep Reinforcement Learning A brief survey
Arulkumaran, Kai
Deisenroth, Marc Peter
Brundage, Miles
Bharath, Anil Anthony
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
[4] Bai X., 2019, P ADV NEUR INF PESS, p10 735
[5] Ballas N., 2015, DELVING DEEPER CONVO
[6] Opportunities for multiagent systems and multiagent reinforcement learning in traffic control
Bazzan, Ana L. C.
[J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2009, 18 (03) : 342 - 375
[7] Bloem M, 2014, IEEE DECIS CONTR P, P4911, DOI 10.1109/CDC.2014.7040156
[8] DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
Chen, Chenyi
Seff, Ari
Kornhauser, Alain
Xiao, Jianxiong
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2722 - 2730
[9] Chen HK, 2019, AAAI CONF ARTIF INTE, P3312
[10] Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation
Chen, Shi-Yong
Yu, Yang
Da, Qing
Tan, Jun
Huang, Hai-Kuan
Tang, Hai-Hong
[J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1187 - 1196

← 1 2 3 4 5 6 7 →