Policy Teaching Through Reward Function Learning

被引：0

作者：

Zhang, Haoqi ^{[1
]}

Parkes, David C. ^{[1
]}

Chen, Yiling ^{[1
]}

机构：

[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009 | 2009年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specilied desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.

引用

页码：295 / 304

页数：10

共 50 条

[41] Reward-Free Policy Space Compression for Reinforcement Learning [J].

Mutti, Mirco ;

Del Col, Stefano ;

Restelli, Marcello .

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151

[42] Pessimistic Reward Models for Off-Policy Learning in Recommendation [J].

Jeunen, Olivier ;

Goethals, Bart .

15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, :63-74

[43] Transfer Learning for Direct Policy Search: A Reward Shaping Approach [J].

Doncieux, Stephane .

2013 IEEE THIRD JOINT INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 2013,

[44] Granulocyte Colony Stimulating Factor Enhances Reward Learning through Potentiation of Mesolimbic Dopamine System Function [J].

Kutlu, Munir Gunes ;

Brady, Lillian J. ;

Peck, Emily G. ;

Hofford, Rebecca S. ;

Yorgason, Jordan T. ;

Siciliano, Cody A. ;

Kiraly, Drew D. ;

Calipari, Erin S. .

JOURNAL OF NEUROSCIENCE, 2018, 38 (41) :8845-8859

[45] Reinforcement learning with optimized reward function for stealth applications [J].

Mendonca, Matheus R. F. ;

Bernardino, Heder S. ;

Neto, Raul Fonseca .

ENTERTAINMENT COMPUTING, 2018, 25 :37-47

[46] Sequence Prediction with Unlabeled Data by Reward Function Learning [J].

Wu, Lijun ;

Zhao, Li ;

Qin, Tao ;

Lai, Jianhuang ;

Liu, Tie-Yan .

PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, :3098-3104

[47] Does pollen function as a reward for honeybees in associative learning? [J].

Gruter, C. ;

Arenas, A. ;

Farina, W. M. .

INSECTES SOCIAUX, 2008, 55 (04) :425-427

[48] Survey of apprenticeship learning based on reward function approximating [J].

Jin, Zhuojun ;

Qian, Hui ;

Chen, Shenyi ;

Zhu, Miaoliang .

Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2008, 36 (SUPPL. 1) :288-290

[49] Design of Reward Function on Reinforcement Learning for Automated Driving [J].

Goto, Takeru ;

Kizumi, Yuki ;

Iwasaki, Shun .

IFAC PAPERSONLINE, 2023, 56 (02) :7948-7953

[50] Unsupervised Reinforcement Learning For Video Summarization Reward Function [J].

Wang, Lei ;

Zhu, Yaping ;

Pan, Hong .

PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, :40-44

← 1 2 3 4 5 →