Boundary sampling to boost mutation testing for deep learning models

被引:17
作者
Shen, Weijun [1 ,3 ]
Li, Yanhui [1 ,2 ]
Han, Yuanlei [1 ,2 ]
Chen, Lin [1 ,2 ]
Wu, Di [4 ]
Zhou, Yuming [1 ,2 ]
Xu, Baowen [1 ,2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Dept Comp Sci & Technol, Nanjing 210023, Peoples R China
[3] Nanjing Univ, Software Inst, Nanjing 210023, Peoples R China
[4] Momenta, Nantiancheng Rd, Suzhou, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Software testing; Deep learning; Mutation testing; Boundary; Neural network; REGRESSION TEST SELECTION; MUTANT REDUCTION; OPERATORS; NETWORKS;
D O I
10.1016/j.infsof.2020.106413
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: The prevalent application of Deep Learning (DL) models has raised concerns about their reliability. Due to the data-driven programming paradigm, the quality of test datasets is extremely important to gain accurate assessment of DL models. Recently, researchers have introduced mutation testing into DL testing, which applies mutation operators to generate mutants from DL models, and observes whether the test data can identify mutants to check the quality of test dataset. However, there still exist many factors (e.g., huge labeling efforts and high running cost) hindering the implementation of mutation testing for DL models. Objective: We desire for an approach to selecting a smaller, sensitive, representative and efficient subset of the whole test dataset to promote the current mutation testing (e.g., reduce labeling and running cost) for DL Models. Method: We propose boundary sample selection (BSS), which employs the distance of samples to decision boundary of DL models as the indicator to construct the appropriate subset. To evaluate the performance of BSS, we conduct an extensive empirical study with two widely-used datasets, three popular DL models, and 14 up-to-date DL mutation operators. Results : We observe that (1) The sizes of our subsets generated by BSS are much smaller (about 3%-20% of the whole test set). (2) Under most mutation operators, our subsets are superior (about 9.94-21.63) than the whole test sets in observing mutation effects. (3) Our subsets could replace the whole test sets to a very high degree (higher than 97%) when considering mutation score. (4) The MRR values of our proposed subsets are clearly better (about 2.28-13.19 times higher) than that of the whole test sets. Conclusions: The result shows that BSS can help testers save labelling cost, run mutation testing quickly and identify killed mutants early.
引用
收藏
页数:16
相关论文
共 82 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   Specification Mutation Analysis for Validating Timed Testing Approaches Based on Timed Automata [J].
AbouTrab, M. Saeed ;
Counsell, Steve ;
Hierons, Robert M. .
2012 IEEE 36TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2012, :660-669
[3]   Formal mutation testing for Circus [J].
Alberto, Alex ;
Cavalcanti, Ana ;
Gaudel, Marie-Claude ;
Simao, Adenilso .
INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 81 :131-153
[4]  
Ammann P., 2016, Introduction to software testing, DOI 10.1017/9781316771273
[5]   On the use of usage patterns from telemetry data for test case prioritization [J].
Anderson, Jeff ;
Azizi, Maral ;
Salem, Saeed ;
Do, Hyunsook .
INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 113 :110-130
[6]   Is mutation an appropriate tool for testing experiments? [J].
Andrews, JH ;
Briand, LC ;
Labiche, Y .
ICSE 05: 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2005, :402-411
[7]  
[Anonymous], 1990, P IFIP C APPROVING S
[8]  
[Anonymous], 2012, GI JAHRESTAGUNG
[9]   A novel use of equivalent mutants for static anomaly detection in software artifacts [J].
Arcaini, Paolo ;
Gargantini, Angelo ;
Riccobene, Elvinia ;
Vavassori, Paolo .
INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 81 :52-64
[10]  
Arrieta A., 2019, INF SOFTW TECHNOL