Code-aware fault localization with pre-training and interpretable machine learning

被引:1
作者
Zhang, Zhuo [1 ]
Li, Ya [2 ]
Yang, Sha [1 ]
Zhang, Zhanjun [3 ]
Lei, Yan [4 ]
机构
[1] Guangzhou Coll Commerce, Sch Informat Technol & Engn, Guangzhou, Peoples R China
[2] Shanghai Jiao Tong Univ, Ningbo Artificial Intelligence Inst, Ningbo, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[4] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Fault localization; Pre-training; Interpretable machine learning;
D O I
10.1016/j.eswa.2023.121689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Contrastive Code-Comment Pre-training
    Pei, Xiaohuan
    Liu, Daochang
    Qian, Luo
    Xu, Chang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 398 - 407
  • [2] SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations
    Niu, Changan
    Li, Chuanyi
    Ng, Vincent
    Ge, Jidong
    Huang, Liguo
    Luo, Bin
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2006 - 2018
  • [3] Robot Learning with Sensorimotor Pre-training
    Radosavovic, Ilija
    Shi, Baifeng
    Fu, Letian
    Goldberg, Ken
    Darrell, Trevor
    Malik, Jitendra
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [4] Clone Detection with Pre-training Enhanced Code Representation
    Leng L.-S.
    Liu S.
    Tian C.-L.
    Dou S.-J.
    Wang Z.
    Zhang M.-S.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (05): : 1758 - 1773
  • [5] RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation
    Ren, Houxing
    Wang, Jingyuan
    Zhao, Wayne Xin
    Wu, Ning
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3503 - 3511
  • [6] UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
    Chen, Sanyuan
    Wu, Yu
    Wang, Chengyi
    Chen, Zhengyang
    Chen, Zhuo
    Liu, Shujie
    Wu, Jian
    Qian, Yao
    Wei, Furu
    Li, Jinyu
    Yu, Xiangzhan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6152 - 6156
  • [7] Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training
    Cao, Yichao
    Li, Miao
    Feng, Tao
    Wang, Rujing
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 321 - 333
  • [8] Ensemble and Pre-Training Approach for Echo State Network and Extreme Learning Machine Models
    Tang, Lingyu
    Wang, Jun
    Wang, Mengyao
    Zhao, Chunyu
    ENTROPY, 2024, 26 (03)
  • [9] VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection
    Hanif, Hazim
    Maffeis, Sergio
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] Model-Aware Pre-Training for Radial Distortion Rectification
    Wang, Wendi
    Feng, Hao
    Zhou, Wengang
    Liao, Zhaokang
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5764 - 5778