Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

被引:5
作者
Chen, Xiaocong [1 ]
Yao, Lina [1 ]
Wang, Xianzhi [2 ]
Sun, Aixin [3 ]
Sheng, Quan Z. [4 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW 2007, Australia
[3] Nanyang Technol Univ, Singapore 639798, Singapore
[4] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia
关键词
Behavioral sciences; Reinforcement learning; Task analysis; Mathematical models; Generative adversarial networks; Recommender systems; Training; Inverse reinforcement learning; behavioral tendency modeling; adversarial training; generative model;
D O I
10.1109/TKDE.2022.3186920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.
引用
收藏
页码:9878 / 9889
页数:12
相关论文
共 66 条
  • [21] Context-aware system for proactive personalized service based on context history
    Hong, Jongyi
    Suh, Eui-Ho
    Kim, Junyoung
    Kim, SuYeon
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7448 - 7457
  • [22] Hu L, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1858
  • [23] Gulrajani I, 2017, ADV NEUR IN, V30
  • [24] Kim H. R., 2003, IUI 03. 2003 International Conference on Intelligent User Interfaces, P101, DOI 10.1145/604045.604064
  • [25] Kingma DP, 2014, ADV NEUR IN, V27
  • [26] Konda VR, 2000, ADV NEUR IN, V12, P1008
  • [27] Kostrikov I., 2019, P INT C LEARN REPR
  • [28] Learning Behavior Styles with Inverse Reinforcement Learning
    Lee, Seong Jae
    popovic, Zoran
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04):
  • [29] Lillicrap T.P., 2015, CONTINUOUS CONTROL D
  • [30] End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding
    Liu, Feng
    Guo, Huifeng
    Li, Xutao
    Tang, Ruiming
    Ye, Yunming
    He, Xiuqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 384 - 392