Policy Teaching Through Reward Function Learning

被引:0
作者
Zhang, Haoqi [1 ]
Parkes, David C. [1 ]
Chen, Yiling [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
来源
10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
[41]   Reward-Free Policy Space Compression for Reinforcement Learning [J].
Mutti, Mirco ;
Del Col, Stefano ;
Restelli, Marcello .
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[42]   Pessimistic Reward Models for Off-Policy Learning in Recommendation [J].
Jeunen, Olivier ;
Goethals, Bart .
15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, :63-74
[43]   Transfer Learning for Direct Policy Search: A Reward Shaping Approach [J].
Doncieux, Stephane .
2013 IEEE THIRD JOINT INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 2013,
[44]   Granulocyte Colony Stimulating Factor Enhances Reward Learning through Potentiation of Mesolimbic Dopamine System Function [J].
Kutlu, Munir Gunes ;
Brady, Lillian J. ;
Peck, Emily G. ;
Hofford, Rebecca S. ;
Yorgason, Jordan T. ;
Siciliano, Cody A. ;
Kiraly, Drew D. ;
Calipari, Erin S. .
JOURNAL OF NEUROSCIENCE, 2018, 38 (41) :8845-8859
[45]   Reinforcement learning with optimized reward function for stealth applications [J].
Mendonca, Matheus R. F. ;
Bernardino, Heder S. ;
Neto, Raul Fonseca .
ENTERTAINMENT COMPUTING, 2018, 25 :37-47
[46]   Sequence Prediction with Unlabeled Data by Reward Function Learning [J].
Wu, Lijun ;
Zhao, Li ;
Qin, Tao ;
Lai, Jianhuang ;
Liu, Tie-Yan .
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, :3098-3104
[47]   Does pollen function as a reward for honeybees in associative learning? [J].
Gruter, C. ;
Arenas, A. ;
Farina, W. M. .
INSECTES SOCIAUX, 2008, 55 (04) :425-427
[48]   Survey of apprenticeship learning based on reward function approximating [J].
Jin, Zhuojun ;
Qian, Hui ;
Chen, Shenyi ;
Zhu, Miaoliang .
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2008, 36 (SUPPL. 1) :288-290
[49]   Design of Reward Function on Reinforcement Learning for Automated Driving [J].
Goto, Takeru ;
Kizumi, Yuki ;
Iwasaki, Shun .
IFAC PAPERSONLINE, 2023, 56 (02) :7948-7953
[50]   Unsupervised Reinforcement Learning For Video Summarization Reward Function [J].
Wang, Lei ;
Zhu, Yaping ;
Pan, Hong .
PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, :40-44