Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

被引:5
作者
Chen, Xiaocong [1 ]
Yao, Lina [1 ]
Wang, Xianzhi [2 ]
Sun, Aixin [3 ]
Sheng, Quan Z. [4 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW 2007, Australia
[3] Nanyang Technol Univ, Singapore 639798, Singapore
[4] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia
关键词
Behavioral sciences; Reinforcement learning; Task analysis; Mathematical models; Generative adversarial networks; Recommender systems; Training; Inverse reinforcement learning; behavioral tendency modeling; adversarial training; generative model;
D O I
10.1109/TKDE.2022.3186920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.
引用
收藏
页码:9878 / 9889
页数:12
相关论文
共 66 条
  • [1] Abbeel P., 2004, P 21 INT C MACH LEAR, P1, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
  • [2] Arjovsky M, 2017, Arxiv, DOI arXiv:1701.07875
  • [3] Deep Reinforcement Learning A brief survey
    Arulkumaran, Kai
    Deisenroth, Marc Peter
    Brundage, Miles
    Bharath, Anil Anthony
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
  • [4] Bai X., 2019, P ADV NEUR INF PESS, p10 735
  • [5] Ballas N., 2015, DELVING DEEPER CONVO
  • [6] Opportunities for multiagent systems and multiagent reinforcement learning in traffic control
    Bazzan, Ana L. C.
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2009, 18 (03) : 342 - 375
  • [7] Bloem M, 2014, IEEE DECIS CONTR P, P4911, DOI 10.1109/CDC.2014.7040156
  • [8] DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
    Chen, Chenyi
    Seff, Ari
    Kornhauser, Alain
    Xiao, Jianxiong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2722 - 2730
  • [9] Chen HK, 2019, AAAI CONF ARTIF INTE, P3312
  • [10] Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation
    Chen, Shi-Yong
    Yu, Yang
    Da, Qing
    Tan, Jun
    Huang, Hai-Kuan
    Tang, Hai-Hong
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1187 - 1196