Test Case Selection for Neural Network via Data Mutation

被引:0
|
作者
Cao, Xue-Jie [1 ]
Chen, Jun-Jie [1 ]
Yan, Ming [1 ]
You, Han-Mo [1 ]
Wu, Zhuo [2 ]
Wang, Zan [1 ,2 ]
机构
[1] College of Intelligence and Computing, Tianjin University, Tianjin
[2] School of New Media and Communication, Tianjin University, Tianjin
来源
Ruan Jian Xue Bao/Journal of Software | 2024年 / 35卷 / 11期
关键词
data mutation; deep learning; software testing; test case selection;
D O I
10.13328/j.cnki.jos.007005
中图分类号
学科分类号
摘要
Nowadays, deep neural network (DNN) is widely used in autonomous driving, medical diagnosis, speech recognition, face recognition, and other safety-critical fields. Therefore, DNN testing is critical to ensure the quality of DNN. However, labeling test cases to judge whether the DNN model predictions are correct is costly. Therefore, selecting test cases that reveal incorrect behavior of DNN models and labeling them earlier can help developers debug DNN models as soon as possible, thus improving the efficiency of DNN testing and ensuring the quality of DNN models. This study proposes a test case selection method based on data mutation, namely DMS. In this method, a data mutation operator is designed and implemented to generate a mutation model to simulate model defects and capture the dynamic pattern of test case bug-revealing, so as to evaluate the ability of test case bug-revealing. Experiments are conducted on the combination of 25 deep learning test sets and models. The results show that DMS is significantly better than the existing test case selection methods in terms of both the proportion of bug-revealing and the diversity of bug-revealing directions in the selected samples. Specifically, taking the original test set as the candidate set, DMS can filter out 53.85%–99.22% of all bug-revealing test cases when selecting 10% of the test cases. Moreover, when 5% of the test cases are selected, the selected cases by DMS can cover almost all bug-revealing directions. Compared with the eight comparison methods, DMS finds 12.38%–71.81% more bug-revealing cases on average, which proves the significant effectiveness of DMS in the task of test case selection. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:4973 / 4992
页数:19
相关论文
共 60 条
  • [1] Zhang MS, Zhang YQ, Zhang LM, Liu C, Khurshid S., DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems, Proc. of the 33rd IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE), pp. 132-142, (2018)
  • [2] Hu GS, Yang YX, Yi D, Kittler J, Christmas W, Li SZ, Hospedales T., When face recognition meets with deep learning: An evaluation of convolutional neural networks for face recognition, Proc. of the 2015 IEEE Int’l Conf. on Computer Vision Workshops, pp. 384-392, (2015)
  • [3] Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY., Deep speech: Scaling up end-to-end speech recognition, (2014)
  • [4] Chan HP, Samala RK, Hadjiiski LM, Zhou C., Deep learning in medical image analysis, Deep Learning in Medical Image Analysis: Challenges and Applications, pp. 3-21, (2020)
  • [5] Pan WJ, Duan YJ, Zhang Q, Tang JH, Zhou J., Deep learning for aircraft wake vortex identification, IOP Conf. Series: Materials Science and Engineering, 685, (2019)
  • [6] Chen JJ, He XT, Lin QW, Zhang HY, Hao D, Gao F, Xu ZW, Dang YN, Zhang SM., Continuous incident triage for large-scale online service systems, Proc. of the 34th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE), pp. 364-375, (2019)
  • [7] Ma L, Juefei-Xu F, Zhang FY, Sun JY, Xue MH, Li B, Chen CY, Su T, Li L, Liu Y, Zhao JJ, Wang YD., DeepGauge: Multi-granularity testing criteria for deep learning systems, Proc. of the 33rd ACM/IEEE Int’l Conf. on Automated Software Engineering, pp. 120-131, (2018)
  • [8] Zhang JM, Harman M, Ma L, Liu Y., Machine learning testing: Survey, landscapes and horizons, IEEE Trans. on Software Engineering, 48, 1, pp. 1-36, (2022)
  • [9] Wang Z, You HM, Chen JJ, Zhang YY, Dong XY, Zhang WB., Prioritizing test inputs for deep neural networks via mutation analysis, Proc. of the 43rd IEEE/ACM Int’l Conf. on Software Engineering (ICSE), pp. 397-409, (2021)
  • [10] Pei KX, Cao YZ, Yang JF, Jana S., DeepXplore: Automated whitebox testing of deep learning systems, Proc. of the 26th Symp. on Operating Systems Principles, pp. 1-18, (2017)