QuoTe: Quality-oriented Testing for Deep Learning Systems

被引:2
|
作者
Chen, Jialuo [1 ]
Wang, Jingyi [1 ]
Ma, Xingjun [2 ]
Sun, Youcheng [3 ]
Sun, Jun [4 ]
Zhang, Peixin [1 ]
Cheng, Peng [1 ]
机构
[1] Zhejiang Univ, Hangzhou 310027, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
[3] Univ Manchester, Manchester M13 9PL, Lancs, England
[4] Singapore Management Univ, Singapore 188065, Singapore
基金
国家重点研发计划;
关键词
Deep learning; testing; robustness; fairness; ROBUSTNESS;
D O I
10.1145/3582573
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, there has been significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is DL testing-that is, given a property of test, defects of DL systems are found either by fuzzing or guided search with the help of certain testing metrics. However, recent studies have revealed that the neuron coverage metrics, which are commonly used by most existing DL testing approaches, are not necessarily correlated with model quality (e.g., robustness, the most studied model property), and are also not an effective measurement on the confidence of the model quality after testing. In this work, we address this gap by proposing a novel testing framework calledQuoTe (i.e., Quality-oriented Testing). A key part of QuoTe is a quantitative measurement on (1) the value of each test case in enhancing the model property of interest (often via retraining) and (2) the convergence quality of the model property improvement. QuoTe utilizes the proposed metric to automatically select or generate valuable test cases for improving model quality. The proposedmetric is also a lightweight yet strong indicator of how well the improvement converged. Extensive experiments on both image and tabular datasets with a variety of model architectures confirm the effectiveness and efficiency of QuoTe in improving DL model quality-that is, robustness and fairness. As a generic quality-oriented testing framework, future adaptations can be made to other domains (e.g., text) as well as other model properties.
引用
收藏
页数:33
相关论文
共 50 条
  • [31] Deep learning for power quality
    de Oliveira, Roger Alves
    Bollen, Math H. J.
    ELECTRIC POWER SYSTEMS RESEARCH, 2023, 214
  • [32] Metamorphic Testing of Deep Learning Compilers
    Xiao, Dongwei
    Liu, Zhibo
    Yuan, Yuanyuan
    Pang, Qi
    Wang, Shuai
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
  • [33] Survey on Testing of Deep Learning Frameworks
    Ma, Xiang-Yue
    Du, Xiao-Ting
    Cai, Qing
    Zheng, Yang
    Hu, Zheng
    Zheng, Zheng
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3752 - 3784
  • [34] TSDTest: A Efficient Coverage Guided Two-Stage Testing for Deep Learning Systems
    Li, Haoran
    Wang, Shihai
    Shi, Tengfei
    Fang, Xinyue
    Chen, Jian
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 173 - 178
  • [35] Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets
    Bobadilla, Jesus
    Gutierrez, Abraham
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023,
  • [36] Recognition of Road Type and Quality for Advanced Driver Assistance Systems with Deep Learning
    Tumen, Vedat
    Yildirim, Ozal
    Ergen, Burhan
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2018, 24 (06) : 67 - 74
  • [37] Grammar Based Directed Testing of Machine Learning Systems
    Udeshi, Sakshi
    Chattopadhyay, Sudipta
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (11) : 2487 - 2503
  • [38] Survey on Quality Assurance Testing on Service Oriented Architecture
    Jonathan
    Lim, Andreas Pangestu
    Thenuardi, Dionisius Saviordo
    Soewito, Benfano
    Antonyova, Anna
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND TECHNOLOGY (ICIMTECH), 2020, : 443 - 447
  • [39] An Empirical Evaluation of Mutation Operators for Deep Learning Systems
    Jahangirova, Gunel
    Tonella, Paolo
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VALIDATION AND VERIFICATION (ICST 2020), 2020, : 74 - 84
  • [40] Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models
    Meng, Linghan
    Li, Yanhui
    Chen, Lin
    Wang, Zhi
    Wu, Di
    Zhou, Yuming
    Xu, Baowen
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 385 - 396