Testing for Fault Diversity in Reinforcement Learning

被引:1
作者
Mazouni, Quentin [1 ]
Spieker, Helge [1 ]
Gotlieb, Arnaud [1 ]
Acher, Mathieu [2 ]
机构
[1] Simula Res Lab, Oslo, Norway
[2] Univ Rennes, Inria, INSA Rennes, CNRS,IRISA, Rennes, France
来源
PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024 | 2024年
关键词
Software Testing; Reinforcement Learning; Quality Diversity;
D O I
10.1145/3644032.3644458
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, none consider fault characterisation during the search for more diversity. We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model. In this paper, we explore the use of quality diversity optimisation to solve the problem of fault diversity in policy testing. Quality diversity (QD) optimisation is a type of evolutionary algorithm to solve hard combinatorial optimisation problems where high-quality diverse solutions are sought. We define and address the underlying challenges of adapting QD optimisation to the test of action policies. Furthermore, we compare classical QD optimisers to state-of-the-art frameworks dedicated to policy testing, both in terms of search efficiency and fault diversity. We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model, and conclude that QD-based policy testing is a promising approach.
引用
收藏
页码:136 / 146
页数:11
相关论文
共 31 条
[1]  
Chen T.Y., 1998, Metamorphic testing: a new approach for generating next test cases
[2]   Quality and Diversity Optimization: A Unifying Modular Framework [J].
Cully, Antoine ;
Demiris, Yiannis .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2018, 22 (02) :245-259
[3]   Hierarchical reinforcement learning with the MAXQ value function decomposition [J].
Dietterich, TG .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :227-303
[4]   Metamorphic Relations via Relaxations: An Approach to Obtain Oracles for Action-Policy Testing [J].
Eniser, Hasan Ferit ;
Gros, Timo P. ;
Wuestholz, Valentin ;
Hoffmann, Joerg ;
Christakis, Maria .
PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, :52-63
[5]  
Frankish Keith, 2014, The Cambridge Handbook of Artificial Intelligence
[6]  
Geary D, 1989, J. Roy. Stat. Soc. Ser. A, V152, P126, DOI DOI 10.2307/2982840
[7]   Devising Effective Novelty Search Algorithms: A Comprehensive Empirical Study [J].
Gomes, Jorge ;
Mariano, Pedro ;
Christensen, Anders Lyhne .
GECCO'15: PROCEEDINGS OF THE 2015 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2015, :943-950
[8]  
Gupta Vikas, 2020, GECCO'20. Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, P79, DOI 10.1145/3377929.3389921
[9]   Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems [J].
Haq, Fitash Ul ;
Shin, Donghwan ;
Briand, Lionel C. .
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, :1814-1826
[10]   GENETIC ALGORITHMS [J].
HOLLAND, JH .
SCIENTIFIC AMERICAN, 1992, 267 (01) :66-72