QuoTe: Quality-oriented Testing for Deep Learning Systems

被引：2

作者：

Chen, Jialuo ^{[1
]}

Wang, Jingyi ^{[1
]}

Ma, Xingjun ^{[2
]}

Sun, Youcheng ^{[3
]}

Sun, Jun ^{[4
]}

Zhang, Peixin ^{[1
]}

Cheng, Peng ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou 310027, Peoples R China

[2] Fudan Univ, Shanghai 200433, Peoples R China

[3] Univ Manchester, Manchester M13 9PL, Lancs, England

[4] Singapore Management Univ, Singapore 188065, Singapore

来源：

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY | 2023年 / 32卷 / 05期

基金：

国家重点研发计划;

关键词：

Deep learning; testing; robustness; fairness; ROBUSTNESS;

D O I：

10.1145/3582573

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Recently, there has been significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is DL testing-that is, given a property of test, defects of DL systems are found either by fuzzing or guided search with the help of certain testing metrics. However, recent studies have revealed that the neuron coverage metrics, which are commonly used by most existing DL testing approaches, are not necessarily correlated with model quality (e.g., robustness, the most studied model property), and are also not an effective measurement on the confidence of the model quality after testing. In this work, we address this gap by proposing a novel testing framework calledQuoTe (i.e., Quality-oriented Testing). A key part of QuoTe is a quantitative measurement on (1) the value of each test case in enhancing the model property of interest (often via retraining) and (2) the convergence quality of the model property improvement. QuoTe utilizes the proposed metric to automatically select or generate valuable test cases for improving model quality. The proposedmetric is also a lightweight yet strong indicator of how well the improvement converged. Extensive experiments on both image and tabular datasets with a variety of model architectures confirm the effectiveness and efficiency of QuoTe in improving DL model quality-that is, robustness and fairness. As a generic quality-oriented testing framework, future adaptations can be made to other domains (e.g., text) as well as other model properties.

引用

页数：33

共 50 条

[31] Deep learning for power quality
de Oliveira, Roger Alves
Bollen, Math H. J.
ELECTRIC POWER SYSTEMS RESEARCH, 2023, 214
[32] Metamorphic Testing of Deep Learning Compilers
Xiao, Dongwei
Liu, Zhibo
Yuan, Yuanyuan
Pang, Qi
Wang, Shuai
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
[33] Survey on Testing of Deep Learning Frameworks
Ma, Xiang-Yue
Du, Xiao-Ting
Cai, Qing
Zheng, Yang
Hu, Zheng
Zheng, Zheng
Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3752 - 3784
[34] TSDTest: A Efficient Coverage Guided Two-Stage Testing for Deep Learning Systems
Li, Haoran
Wang, Shihai
Shi, Tengfei
Fang, Xinyue
Chen, Jian
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 173 - 178
[35] Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets
Bobadilla, Jesus
Gutierrez, Abraham
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023,
[36] Recognition of Road Type and Quality for Advanced Driver Assistance Systems with Deep Learning
Tumen, Vedat
Yildirim, Ozal
Ergen, Burhan
ELEKTRONIKA IR ELEKTROTECHNIKA, 2018, 24 (06) : 67 - 74
[37] Grammar Based Directed Testing of Machine Learning Systems
Udeshi, Sakshi
Chattopadhyay, Sudipta
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (11) : 2487 - 2503
[38] Survey on Quality Assurance Testing on Service Oriented Architecture
Jonathan
Lim, Andreas Pangestu
Thenuardi, Dionisius Saviordo
Soewito, Benfano
Antonyova, Anna
PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND TECHNOLOGY (ICIMTECH), 2020, : 443 - 447
[39] An Empirical Evaluation of Mutation Operators for Deep Learning Systems
Jahangirova, Gunel
Tonella, Paolo
2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VALIDATION AND VERIFICATION (ICST 2020), 2020, : 74 - 84
[40] Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models
Meng, Linghan
Li, Yanhui
Chen, Lin
Wang, Zhi
Wu, Di
Zhou, Yuming
Xu, Baowen
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 385 - 396

← 1 2 3 4 5 →