Code-aware fault localization with pre-training and interpretable machine learning

被引:1
作者
Zhang, Zhuo [1 ]
Li, Ya [2 ]
Yang, Sha [1 ]
Zhang, Zhanjun [3 ]
Lei, Yan [4 ]
机构
[1] Guangzhou Coll Commerce, Sch Informat Technol & Engn, Guangzhou, Peoples R China
[2] Shanghai Jiao Tong Univ, Ningbo Artificial Intelligence Inst, Ningbo, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[4] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Fault localization; Pre-training; Interpretable machine learning;
D O I
10.1016/j.eswa.2023.121689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Emotion-Aware Multimodal Pre-training for Image-Grounded Emotional Response Generation
    Tian, Zhiliang
    Wen, Zhihua
    Wu, Zhenghao
    Song, Yiping
    Tang, Jintao
    Li, Dongsheng
    Zhang, Nevin L.
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 3 - 19
  • [32] Supply Chain Security: Pre-training Model for Python']Python Source Code Vulnerability Detection
    Le, Yiwang
    Li, Hui
    Wang, Bin
    Luo, Zhixiong
    Yang, Ao
    Ma, Ziheng
    2024 3RD INTERNATIONAL JOINT CONFERENCE ON INFORMATION AND COMMUNICATION ENGINEERING, JCICE 2024, 2024, : 150 - 155
  • [33] ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images
    Yang, Jiawei
    Chen, Hanbo
    Liang, Yuan
    Huang, Junzhou
    He, Lei
    Yao, Jianhua
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 523 - 539
  • [34] XCODE: Towards Cross-Language Code Representation with Large-Scale Pre-Training
    Lin, Zehao
    Li, Guodun
    Zhang, Jingfeng
    Deng, Yue
    Zeng, Xiangji
    Zhang, Yin
    Wan, Yao
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (03)
  • [35] Pre-Training Model and Client Selection Optimization for Improving Federated Learning Efficiency
    Ge, Bingchen
    Zhou, Ying
    Xie, Liping
    Kou, Lirong
    2024 9TH INTERNATIONAL CONFERENCE ON ELECTRONIC TECHNOLOGY AND INFORMATION SCIENCE, ICETIS 2024, 2024, : 650 - 660
  • [36] JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
    Mao, Zhuoyuan
    Cromieres, Fabien
    Dabre, Raj
    Song, Haiyue
    Kurohashi, Sadao
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3683 - 3691
  • [37] Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
    Chen, Qian
    Wang, Wen
    Zhang, Qinglin
    INTERSPEECH 2021, 2021, : 1244 - 1248
  • [38] Deep learning and pre-training technology for encrypted traffic classification: A comprehensive review
    Dong, Wenqi
    Yu, Jing
    Lin, Xinjie
    Gou, Gaopeng
    Xiong, Gang
    NEUROCOMPUTING, 2025, 617
  • [39] Low-Resource Neural Machine Translation Using XLNet Pre-training Model
    Wu, Nier
    Hou, Hongxu
    Guo, Ziyue
    Zheng, Wei
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 503 - 514
  • [40] GENERATION OF SYNTHETIC STRUCTURAL MAGNETIC RESONANCE IMAGES FOR DEEP LEARNING PRE-TRAINING
    Castro, Eduardo
    Ulloa, Alvaro
    Plis, Sergey M.
    Turner, Jessica A.
    Calhoun, Vince D.
    2015 IEEE 12TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2015, : 1057 - 1060