On Normative Reinforcement Learning via Safe Reinforcement Learning

被引:0
作者
Neufeld, Emery A. [1 ]
Bartocci, Ezio [1 ]
Ciabattoni, Agata [1 ]
机构
[1] TU Wien, Vienna, Austria
来源
PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS | 2023年 / 13753卷
关键词
D O I
10.1007/978-3-031-21203-1_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) has proven a successful technique for teaching autonomous agents goal-directed behaviour. As RL agents further integrate with our society, they must learn to comply with ethical, social, or legal norms. Defeasible deontic logics are natural formal frameworks to specify and reason about such norms in a transparent way. However, their effective and efficient integration in RL agents remains an open problem. On the other hand, linear temporal logic (LTL) has been successfully employed to synthesize RL policies satisfying, e.g., safety requirements. In this paper, we investigate the extent to which the established machinery for safe reinforcement learning can be leveraged for directing normative behaviour for RL agents. We analyze some of the difficulties that arise from attempting to represent norms with LTL, provide an algorithm for synthesizing LTL specifications from certain normative systems, and analyze its power and limits with a case study.
引用
收藏
页码:72 / 89
页数:18
相关论文
共 32 条
[21]   Enforcing ethical goals over reinforcement-learning policies [J].
Neufeld, Emery A. ;
Bartocci, Ezio ;
Ciabattoni, Agata ;
Governatori, Guido .
ETHICS AND INFORMATION TECHNOLOGY, 2022, 24 (04)
[22]  
Noothigattu R., 2019, LNCS, V12158, P217
[23]  
Panagiotidi S., 2014, Coordination, Organizations, Institutions, and Norms in Agent Systems IX, V8386, P346
[24]  
Pnueli A., 1977, 18th Annual Symposium on Foundations of Computer Science, P46, DOI 10.1109/SFCS.1977.32
[25]   TEMPEST - Synthesis Tool for Reactive Systems and Shields in Probabilistic Environments [J].
Pranger, Stefan ;
Koenighofer, Bettina ;
Posch, Lukas ;
Bloem, Roderick .
AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2021, 2021, 12971 :222-228
[26]  
Rodriguez-Soto M, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P545
[27]  
Sadigh D, 2014, IEEE DECIS CONTR P, P1091, DOI 10.1109/CDC.2014.7039527
[28]  
Searle J., 1969, SPEECH ACTS
[29]   Limit-Deterministic Buchi Automata for Linear Temporal Logic [J].
Sickert, Salomon ;
Esparza, Javier ;
Jaax, Stefan ;
Kretinsky, Jan .
COMPUTER AIDED VERIFICATION: 28TH INTERNATIONAL CONFERENCE, CAV 2016, PT II, 2016, 9780 :312-332
[30]   LEARNING FROM DELAYED REWARDS [J].
KROSE, BJA .
ROBOTICS AND AUTONOMOUS SYSTEMS, 1995, 15 (04) :233-235