Policy Teaching Through Reward Function Learning

被引:0
作者
Zhang, Haoqi [1 ]
Parkes, David C. [1 ]
Chen, Yiling [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
来源
10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
[21]   Model-free Policy Learning with Reward Gradients [J].
Lan, Qingfong ;
Tosatto, Samuele ;
Farrahi, Homayoon ;
Mahmood, A. Rupam .
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[22]   Dynamic Adjustment of Reward Function for Proximal Policy Optimization with Imitation Learning: Application to Automated Parking Systems [J].
Albilani, Mohamad ;
Bouzeghoub, Amel .
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, :1400-1408
[23]   Active reward learning with a novel acquisition function [J].
Daniel, Christian ;
Kroemer, Oliver ;
Viering, Malte ;
Metz, Jan ;
Peters, Jan .
AUTONOMOUS ROBOTS, 2015, 39 (03) :389-405
[24]   CHILDRENS DISCRIMINATION LEARNING AS A FUNCTION OF REWARD AND PUNISHMENT [J].
PENNEY, RK ;
LUPTON, AA .
JOURNAL OF COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1961, 54 (04) :449-&
[25]   Evolution of an Internal Reward Function for Reinforcement Learning [J].
Zuo, Weiyi ;
Pedersen, Joachim Winther ;
Risi, Sebastian .
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, :351-354
[26]   Active reward learning with a novel acquisition function [J].
Christian Daniel ;
Oliver Kroemer ;
Malte Viering ;
Jan Metz ;
Jan Peters .
Autonomous Robots, 2015, 39 :389-405
[27]   LEARNING IN HONEYBEES AS A FUNCTION OF AMOUNT AND FREQUENCY OF REWARD [J].
BUCHANAN, GM ;
BITTERMAN, ME .
ANIMAL LEARNING & BEHAVIOR, 1988, 16 (03) :247-255
[28]   A Humanoid Robot Standing Up Through Learning from Demonstration Using a Multimodal Reward Function [J].
Gonzalez-Fierro, Miguel ;
Balaguer, Carlos ;
Swann, Nicola ;
Nanayakkara, Thrishantha .
2013 13TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2013, :74-79
[29]   Average-Reward Off-Policy Policy Evaluation with Function Approximation [J].
Zhang, Shangtong ;
Wan, Yi ;
Sutton, Richard S. ;
Whiteson, Shimon .
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[30]   Teaching family policy through a policy practice framework [J].
Rocha, CJ ;
Johnson, AK .
JOURNAL OF SOCIAL WORK EDUCATION, 1997, 33 (03) :433-444