Code-aware fault localization with pre-training and interpretable machine learning

被引：1

作者：

Zhang, Zhuo ^{[1
]}

Li, Ya ^{[2
]}

Yang, Sha ^{[1
]}

Zhang, Zhanjun ^{[3
]}

Lei, Yan ^{[4
]}

机构：

[1] Guangzhou Coll Commerce, Sch Informat Technol & Engn, Guangzhou, Peoples R China

[2] Shanghai Jiao Tong Univ, Ningbo Artificial Intelligence Inst, Ningbo, Peoples R China

[3] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China

[4] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 238卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Fault localization; Pre-training; Interpretable machine learning;

D O I：

10.1016/j.eswa.2023.121689

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.

引用

页数：13

共 50 条

[31] Emotion-Aware Multimodal Pre-training for Image-Grounded Emotional Response Generation
Tian, Zhiliang
Wen, Zhihua
Wu, Zhenghao
Song, Yiping
Tang, Jintao
Li, Dongsheng
Zhang, Nevin L.
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 3 - 19
[32] Supply Chain Security: Pre-training Model for Python']Python Source Code Vulnerability Detection
Le, Yiwang
Li, Hui
Wang, Bin
Luo, Zhixiong
Yang, Ao
Ma, Ziheng
2024 3RD INTERNATIONAL JOINT CONFERENCE ON INFORMATION AND COMMUNICATION ENGINEERING, JCICE 2024, 2024, : 150 - 155
[33] ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images
Yang, Jiawei
Chen, Hanbo
Liang, Yuan
Huang, Junzhou
He, Lei
Yao, Jianhua
COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 523 - 539
[34] XCODE: Towards Cross-Language Code Representation with Large-Scale Pre-Training
Lin, Zehao
Li, Guodun
Zhang, Jingfeng
Deng, Yue
Zeng, Xiangji
Zhang, Yin
Wan, Yao
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (03)
[35] Pre-Training Model and Client Selection Optimization for Improving Federated Learning Efficiency
Ge, Bingchen
Zhou, Ying
Xie, Liping
Kou, Lirong
2024 9TH INTERNATIONAL CONFERENCE ON ELECTRONIC TECHNOLOGY AND INFORMATION SCIENCE, ICETIS 2024, 2024, : 650 - 660
[36] JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
Mao, Zhuoyuan
Cromieres, Fabien
Dabre, Raj
Song, Haiyue
Kurohashi, Sadao
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3683 - 3691
[37] Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
Chen, Qian
Wang, Wen
Zhang, Qinglin
INTERSPEECH 2021, 2021, : 1244 - 1248
[38] Deep learning and pre-training technology for encrypted traffic classification: A comprehensive review
Dong, Wenqi
Yu, Jing
Lin, Xinjie
Gou, Gaopeng
Xiong, Gang
NEUROCOMPUTING, 2025, 617
[39] Low-Resource Neural Machine Translation Using XLNet Pre-training Model
Wu, Nier
Hou, Hongxu
Guo, Ziyue
Zheng, Wei
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 503 - 514
[40] GENERATION OF SYNTHETIC STRUCTURAL MAGNETIC RESONANCE IMAGES FOR DEEP LEARNING PRE-TRAINING
Castro, Eduardo
Ulloa, Alvaro
Plis, Sergey M.
Turner, Jessica A.
Calhoun, Vince D.
2015 IEEE 12TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2015, : 1057 - 1060

← 1 2 3 4 5 →