On Normative Reinforcement Learning via Safe Reinforcement Learning

被引:0
作者
Neufeld, Emery A. [1 ]
Bartocci, Ezio [1 ]
Ciabattoni, Agata [1 ]
机构
[1] TU Wien, Vienna, Austria
来源
PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS | 2023年 / 13753卷
关键词
D O I
10.1007/978-3-031-21203-1_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) has proven a successful technique for teaching autonomous agents goal-directed behaviour. As RL agents further integrate with our society, they must learn to comply with ethical, social, or legal norms. Defeasible deontic logics are natural formal frameworks to specify and reason about such norms in a transparent way. However, their effective and efficient integration in RL agents remains an open problem. On the other hand, linear temporal logic (LTL) has been successfully employed to synthesize RL policies satisfying, e.g., safety requirements. In this paper, we investigate the extent to which the established machinery for safe reinforcement learning can be leveraged for directing normative behaviour for RL agents. We analyze some of the difficulties that arise from attempting to represent norms with LTL, provide an algorithm for synthesizing LTL specifications from certain normative systems, and analyze its power and limits with a case study.
引用
收藏
页码:72 / 89
页数:18
相关论文
共 32 条
[1]  
Alechina N, 2018, J APPL LOG-IFCOLOG, V5, P457
[2]  
Alshiekh M, 2018, AAAI CONF ARTIF INTE, P2669
[3]  
Boella G., 2003, P ICAIL 03 ED, P81
[4]  
Boella G., 2004, P 9 INT C PRINC KNOW, V4, P255
[5]  
De Giacomo G., 2019, ICAPS, V29, P128
[6]  
De Giacomo G, 2014, LECT NOTES COMPUT SC, V8659, P1, DOI 10.1007/978-3-319-10172-9_1
[7]  
Esparza J, 2014, LECT NOTES COMPUT SC, V8559, P192, DOI 10.1007/978-3-319-08867-9_13
[8]  
FORRESTER JW, 1984, J PHILOS, V81, P193, DOI 10.2307/2026120
[9]  
Fu J., 2014, P ROB SCI SYST ROB C
[10]  
Governatori G., 2015, ICAIL '15, P63, DOI DOI 10.1145/2746090.2746105