Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation

被引:2
作者
Wang, Siyu [1 ]
Chen, Xiaocong [2 ]
McAuley, Julian [3 ]
Cripps, Sally [4 ]
Yao, Lina [1 ,2 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] CSIRO, Data61, Eveleigh, NSW 2015, Australia
[3] Univ Calif San Diego UCSD, Comp Sci Dept, La Jolla, CA 92093 USA
[4] Univ Technol Sydney, Human Technol Inst, Sydney, NSW 2007, Australia
关键词
Recommender systems; Mathematical models; Data models; Australia; Training; Learning systems; Computer science; Causality; counterfactual; deep reinforcement learning (DRL); policy synthesis; recommender systems;
D O I
10.1109/TNNLS.2023.3329808
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in recommender systems have proved the potential of reinforcement learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general model-agnostic counterfactual synthesis (MACS) policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesize counterfactual states while preserving significant information in the original state relevant to the user's interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is based on the current recommendation agent's interaction with the environment to adapt to users' dynamic interests. We integrate the proposed policy deep deterministic policy gradient (DDPG), soft actor critic (SAC), and twin delayed DDPG (TD3) in an adaptive pipeline with a recommendation agent that can generate counterfactual data to improve the performance of recommendation. The empirical results on both online simulation and offline datasets demonstrate the effectiveness and generalization of our counterfactual synthesis policy and verify that it improves the performance of RL recommendation agents.
引用
收藏
页码:1044 / 1055
页数:12
相关论文
共 46 条
[1]  
Rusu AA, 2016, Arxiv, DOI arXiv:1511.06295
[2]  
Basilico J., 2004, P 21 INT C MACH LEAR, P9
[3]   Causal Embeddings for Recommendation [J].
Bonner, Stephen ;
Vasile, Flavian .
12TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS), 2018, :104-112
[4]  
Chen HK, 2019, AAAI CONF ARTIF INTE, P3312
[5]   Counterfactual Samples Synthesizing for Robust Visual Question Answering [J].
Chen, Long ;
Yan, Xin ;
Xiao, Jun ;
Zhang, Hanwang ;
Pu, Shiliang ;
Zhuang, Yueting .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10797-10806
[6]   Counterfactual Critic Multi-Agent Training for Scene Graph Generation [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
He, Xiangnan ;
Pu, Shiliang ;
Chang, Shih-Fu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4612-4622
[7]   Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation [J].
Chen, Shi-Yong ;
Yu, Yang ;
Da, Qing ;
Tan, Jun ;
Huang, Hai-Kuan ;
Tang, Hai-Hong .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :1187-1196
[8]   Generative Inverse Deep Reinforcement Learning for Online Recommendation [J].
Chen, Xiaocong ;
Yao, Lina ;
Sun, Aixin ;
Wang, Xianzhi ;
Xu, Xiwei ;
Zhu, Liming .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :201-210
[9]  
Chen XC, 2021, Arxiv, DOI arXiv:2109.03540
[10]   Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation [J].
Chen, Xiaocong ;
Yao, Lina ;
McAuley, Julian ;
Guan, Weili ;
Chang, Xiaojun ;
Wang, Xianzhi .
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :1316-1325