Policy Teaching Through Reward Function Learning

被引：0

作者：

Zhang, Haoqi ^{[1
]}

Parkes, David C. ^{[1
]}

Chen, Yiling ^{[1
]}

机构：

[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.

引用

页码：295 / 304

页数：10

共 50 条

[31] Learning reward timing in cortex through reward dependent expression of synaptic plasticity [J].

Gavornik, Jeffrey P. ;

Shuler, Marshall G. Hussain ;

Loewenstein, Yonatan ;

Bear, Mark F. ;

Shouval, Harel Z. .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (16) :6826-6831

[32] Transformation of language in teaching and learning policy [J].

Ahmad, Rokiah Rozita ;

Majid, Noriza ;

Mamat, Nur Jumaadzan Zaleha ;

Rambely, Azmin Sham ;

Muda, Nora ;

Jaaman, Saiful Hafizah Hj ;

Suradi, Nur Riza Mohd ;

Ismail, Wan Rosmanira ;

Shahabuddin, Faridatulazna Ahmad ;

Nazar, Roslinda Mohd ;

Samsudin, Humaida Banu ;

Zin, Wan Zawiah Wan ;

Zahari, Marina ;

Rafee, Najib Mahmood .

UNIVERSITI KEBANGSAAN MALAYSIA TEACHING AND LEARNING CONGRESS 2011, VOL I, 2012, 59 :685-691

[33] Teaching social inclusion, public policy and governance through active learning and educational games [J].

Perez-Duran, Ixchel ;

Acebillo-Baque, Miriam ;

Comellas-Bonsfills, Josep M. .

TEACHING PUBLIC ADMINISTRATION, 2024,

[34] Fast Probabilistic Policy Reuse via Reward Function Fitting [J].

Liu, Jinmei ;

Wang, Zhi ;

Chen, Chunlin .

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

[35] Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation [J].

Zhao, Qian ;

Han, Jinhui ;

Xu, Mao .

IEEE ACCESS, 2024, 12 :2224-2235

[36] Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation [J].

Zhao, Qian ;

Han, Jinhui ;

Xu, Mao .

IEEE Access, 2024, 12 :2224-2235

[37] LEARNING AND TEACHING THROUGH DISCUSSION [J].

HILL, WF .

CENTRAL STATES SPEECH JOURNAL, 1962, 13 (03) :198-198

[38] Learning Through Teaching Response [J].

Ashton, Rendell W. ;

Burkart, Kristin M. ;

Lenz, Peter H. ;

Kumar, Sunita ;

McCallister, Jennifer W. .

CHEST, 2018, 153 (04) :1082-1083

[39] Reward estimation with scheduled knowledge distillation for dialogue policy learning [J].

Qiu, Junyan ;

Zhang, Haidong ;

Yang, Yiping .

CONNECTION SCIENCE, 2023, 35 (01)

[40] BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES [J].

Liao, Peng ;

Qi, Zhengling ;

Wan, Runzhe ;

Klasnja, Predrag ;

Murphy, Susan A. .

ANNALS OF STATISTICS, 2022, 50 (06) :3364-3387

← 1 2 3 4 5 →