Black-Box Testing of Deep Neural Networks through Test Case Diversity

被引:24
|
作者
Aghababaeyan, Zohreh [1 ]
Abdellatif, Manel [2 ,3 ]
Briand, Lionel [3 ,4 ]
Ramesh, S. [5 ]
Bagherzadeh, Mojtaba [3 ]
机构
[1] Univ Ottawa, Sch EECS, Ottawa, ON K1N 6N5, Canada
[2] Ecole Technol Super, Software & Informat Technol Engn Dept, Montreal, PQ H3C 1K3, Canada
[3] Univ Ottawa, Sch EECS, Ottawa, ON K1N 6N5, Canada
[4] Univ Luxembourg, SnT Ctr Secur Reliabil & Trust, L-4365 Esch Sur Alzette, Luxembourg
[5] Gen Motors, Dept Res & Dev, Warren, MI 48092 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Measurement; Testing; Feature extraction; Closed box; Fault detection; Neurons; Computational modeling; Coverage; deep neural network; diversity; faults; test; CLASSIFICATION; EFFICIENT; DISTANCE;
D O I
10.1109/TSE.2023.3243522
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNNs. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNNs, which is often not feasible or convenient. Measuring such coverage requires executing DNNs with candidate inputs to guide testing, which is not an option in many practical contexts. In this paper, we investigate diversity metrics as an alternative to white-box coverage criteria. For the previously mentioned reasons, we require such metrics to be black-box and not rely on the execution and outputs of DNNs under test. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyze their statistical association with fault detection using four datasets and five DNNs. We further compare diversity with state-of-the-art white-box coverage criteria. As a mechanism to enable such analysis, we also propose a novel way to estimate fault detection in DNNs. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide DNN testing. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage criteria are not adequate to guide the construction of test input sets to detect as many faults as possible using natural inputs.
引用
收藏
页码:3182 / 3204
页数:23
相关论文
共 50 条
  • [31] Black-Box Test-Cost Reduction Based on Bayesian Network Models
    Pan, Renjian
    Zhang, Zhaobo
    Li, Xin
    Chakrabarty, Krishnendu
    Gu, Xinli
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (02) : 386 - 399
  • [32] LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models
    Pan, Rongqi
    Ghaleb, Taher A.
    Briand, Lionel C.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (11) : 3053 - 3070
  • [33] A Black-Box Adversarial Attack Method via Nesterov Accelerated Gradient and Rewiring Towards Attacking Graph Neural Networks
    Zhao, Shu
    Wang, Wenyu
    Du, Ziwei
    Chen, Jie
    Duan, Zhen
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1586 - 1597
  • [34] Compressed Test Pattern Generation for Deep Neural Networks
    Moussa, Dina A.
    Hefenbrock, Michael
    Tahoori, Mehdi
    IEEE TRANSACTIONS ON COMPUTERS, 2025, 74 (01) : 307 - 315
  • [35] Survey on Testing of Deep Neural Networks
    Wang Z.
    Yan M.
    Liu S.
    Chen J.-J.
    Zhang D.-D.
    Wu Z.
    Chen X.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (05): : 1255 - 1275
  • [36] Analysis and testing of black-box component-based systems by inferring partial models
    Shahbaz, Muzammil
    Groz, Roland
    SOFTWARE TESTING VERIFICATION & RELIABILITY, 2014, 24 (04) : 253 - 288
  • [37] Predictive Mutation Analysis of Test Case Prioritization for Deep Neural Networks
    Wei, Zhengyuan
    Wang, Haipeng
    Ashraf, Imran
    Chan, W. K.
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 682 - 693
  • [38] An Empirical Study on Test Case Prioritization Metrics for Deep Neural Networks
    Shi, Ying
    Yin, Beibei
    Zheng, Zheng
    Li, Tiancheng
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 157 - 166
  • [39] Efficient Diversity-Driven Ensemble for Deep Neural Networks
    Zhang, Wentao
    Jiang, Jiawei
    Shao, Yingxia
    Cui, Bin
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 73 - 84
  • [40] Exploiting Dissent: Towards Fuzzing-Based Differential Black-Box Testing of TLS Implementations
    Walz, Andreas
    Sikora, Axel
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2020, 17 (02) : 278 - 291