Code-aware fault localization with pre-training and interpretable machine learning

被引:1
作者
Zhang, Zhuo [1 ]
Li, Ya [2 ]
Yang, Sha [1 ]
Zhang, Zhanjun [3 ]
Lei, Yan [4 ]
机构
[1] Guangzhou Coll Commerce, Sch Informat Technol & Engn, Guangzhou, Peoples R China
[2] Shanghai Jiao Tong Univ, Ningbo Artificial Intelligence Inst, Ningbo, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[4] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Fault localization; Pre-training; Interpretable machine learning;
D O I
10.1016/j.eswa.2023.121689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Improving Knowledge Graph Representation Learning by Structure Contextual Pre-training
    Ye, Ganqiang
    Zhang, Wen
    Bi, Zhen
    Wong, Chi Man
    Chen, Hui
    Chen, Huajun
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, : 151 - 155
  • [22] Pre-training neural machine translation with alignment information via optimal transport
    Su, Xueping
    Zhao, Xingkai
    Ren, Jie
    Li, Yunhong
    Raetsch, Matthias
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 48377 - 48397
  • [23] Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training
    Xie, Chunzhi
    Lv, Jiancheng
    Li, Xiaojie
    SOFT COMPUTING, 2017, 21 (21) : 6471 - 6479
  • [24] GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest
    Gao, Yunfan
    Xiong, Yun
    Wang, Siqi
    Wang, Haofen
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [25] Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training
    Chunzhi Xie
    Jiancheng Lv
    Xiaojie Li
    Soft Computing, 2017, 21 : 6471 - 6479
  • [26] Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2021, 12 (03)
  • [27] Pre-training neural machine translation with alignment information via optimal transport
    Xueping Su
    Xingkai Zhao
    Jie Ren
    Yunhong Li
    Matthias Rätsch
    Multimedia Tools and Applications, 2024, 83 : 48377 - 48397
  • [28] Omni-Training: Bridging Pre-Training and Meta-Training for Few-Shot Learning
    Shu, Yang
    Cao, Zhangjie
    Gao, Jinghan
    Wang, Jianmin
    Yu, Philip S.
    Long, Mingsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15275 - 15291
  • [29] Pre-Training Time-Aware Location Embeddings from Spatial-Temporal Trajectories
    Wan, Huaiyu
    Lin, Yan
    Guo, Shengnan
    Lin, Youfang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5510 - 5523
  • [30] SENTIMENT-AWARE AUTOMATIC SPEECH RECOGNITION PRE-TRAINING FOR ENHANCED SPEECH EMOTION RECOGNITION
    Ghriss, Ayoub
    Yang, Bo
    Rozgic, Viktor
    Shriberg, Elizabeth
    Wang, Chao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7347 - 7351