Policy Teaching Through Reward Function Learning

被引:0
作者
Zhang, Haoqi [1 ]
Parkes, David C. [1 ]
Chen, Yiling [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
来源
10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
[31]   Learning reward timing in cortex through reward dependent expression of synaptic plasticity [J].
Gavornik, Jeffrey P. ;
Shuler, Marshall G. Hussain ;
Loewenstein, Yonatan ;
Bear, Mark F. ;
Shouval, Harel Z. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (16) :6826-6831
[32]   Transformation of language in teaching and learning policy [J].
Ahmad, Rokiah Rozita ;
Majid, Noriza ;
Mamat, Nur Jumaadzan Zaleha ;
Rambely, Azmin Sham ;
Muda, Nora ;
Jaaman, Saiful Hafizah Hj ;
Suradi, Nur Riza Mohd ;
Ismail, Wan Rosmanira ;
Shahabuddin, Faridatulazna Ahmad ;
Nazar, Roslinda Mohd ;
Samsudin, Humaida Banu ;
Zin, Wan Zawiah Wan ;
Zahari, Marina ;
Rafee, Najib Mahmood .
UNIVERSITI KEBANGSAAN MALAYSIA TEACHING AND LEARNING CONGRESS 2011, VOL I, 2012, 59 :685-691
[33]   Teaching social inclusion, public policy and governance through active learning and educational games [J].
Perez-Duran, Ixchel ;
Acebillo-Baque, Miriam ;
Comellas-Bonsfills, Josep M. .
TEACHING PUBLIC ADMINISTRATION, 2024,
[34]   Fast Probabilistic Policy Reuse via Reward Function Fitting [J].
Liu, Jinmei ;
Wang, Zhi ;
Chen, Chunlin .
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[35]   Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation [J].
Zhao, Qian ;
Han, Jinhui ;
Xu, Mao .
IEEE ACCESS, 2024, 12 :2224-2235
[36]   Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation [J].
Zhao, Qian ;
Han, Jinhui ;
Xu, Mao .
IEEE Access, 2024, 12 :2224-2235
[37]   LEARNING AND TEACHING THROUGH DISCUSSION [J].
HILL, WF .
CENTRAL STATES SPEECH JOURNAL, 1962, 13 (03) :198-198
[38]   Learning Through Teaching Response [J].
Ashton, Rendell W. ;
Burkart, Kristin M. ;
Lenz, Peter H. ;
Kumar, Sunita ;
McCallister, Jennifer W. .
CHEST, 2018, 153 (04) :1082-1083
[39]   Reward estimation with scheduled knowledge distillation for dialogue policy learning [J].
Qiu, Junyan ;
Zhang, Haidong ;
Yang, Yiping .
CONNECTION SCIENCE, 2023, 35 (01)
[40]   BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES [J].
Liao, Peng ;
Qi, Zhengling ;
Wan, Runzhe ;
Klasnja, Predrag ;
Murphy, Susan A. .
ANNALS OF STATISTICS, 2022, 50 (06) :3364-3387