A deep semantics-aware data augmentation method for fault localization

被引:0
|
作者
Hu, Jian [1 ]
Lei, Yan [1 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Fault localization; Semantics-aware; Data augmentation; CLONING;
D O I
10.1016/j.infsof.2024.107409
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Fault localization (FL) techniques are employed to identify the relationship between program statements and failures by analyzing runtime information. They rely on the statistics of input data to explore the underlying correlation rooted in it. Consequently, the quality of input data is of utmost importance for FL. However, in practice, passing tests significantly outnumber failing tests regarding a fault. This leads to a class imbalance challenge that can adversely affect the effectiveness of FL. Objective: To tackle the issue of imbalanced data in fault localization, we propose PRAM: a deeP semanticawaRe dAta augmentation Method to improve the effectiveness of FL methods. Method: PRAM utilizes program dependencies to enhance the semantic context, thus showing how a failure is caused. Then, PRAM employs mixup method to synthesize new failing test samples by merging two real failing test cases with a random ratio to balance the input data. Finally, PRAM feeds the balanced data consisting of synthesized failing test cases and original test cases to FL techniques. To evaluate the effectiveness of PRAM, we conducted large-scale experiments on 330 versions of nine large-sized real programs for six state -of -the -art FL methods, two data optimization methods and two data augmentation methods. Results: Our experimental results show that PRAM outperforms in most cases for Top-K metrics and reduces the number of checked statements from 40.38% to 80.04% compared with the original FL methods. Furthermore, PRAM reduces the checked statements from 16.92% to 56.98% for data optimization methods and from 12.48% to 26.82% for data augmentation methods. Conclusion: The experimental results show that PRAM is not only more effective than the original FL methods but also more effective than two representative data optimization methods and two data augmentation methods, which indicates that PRAM is a universal effective data augmentation method for various FL methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Semantics-Aware Active Fault Detection in IoT
    Stamatakis, George J.
    Pappas, Nikolaos
    Fragkiadakis, Alexandros
    Traganitis, Apostolos
    2022 20TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT 2022), 2022, : 161 - 168
  • [2] An API Semantics-Aware Malware Detection Method Based on Deep Learning
    Ma, Xin
    Guo, Shize
    Bai, Wei
    Chen, Jun
    Xia, Shiming
    Pan, Zhisong
    SECURITY AND COMMUNICATION NETWORKS, 2019, 2019
  • [3] Semantics-aware data integration for heterogeneous data sources
    Leida, Marcello
    Gusmini, Alex
    Davies, John
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2013, 4 (04) : 471 - 491
  • [4] Semantics-aware data integration for heterogeneous data sources
    Marcello Leida
    Alex Gusmini
    John Davies
    Journal of Ambient Intelligence and Humanized Computing, 2013, 4 : 471 - 491
  • [5] Semantics-Aware Autoencoder
    Bellini, Vito
    Di Noia, Tommaso
    Di Sciascio, Eugenio
    Schiavone, Angelo
    IEEE ACCESS, 2019, 7 : 166122 - 166137
  • [6] Semantics-Aware Active Fault Detection in Status Updating Systems
    Stamatakis, George
    Pappas, Nikolaos
    Fragkiadakis, Alexandros
    Petroulakis, Nikolaos
    Traganitis, Apostolos
    IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2024, 5 : 1182 - 1196
  • [7] Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
    Yang, Zhao
    Wang, Jiaqi
    Tang, Yansong
    Chen, Kai
    Zhao, Hengshuang
    Torr, Philip H. S.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3222 - 3230
  • [8] Semantics-Aware Document Retrieval for Government Administrative Data
    Kulkarni, Apurva
    Ramanathan, Chandrashekar
    Venugopal, Vinu E.
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2023, 17 (03) : 477 - 491
  • [9] A Semantics-Aware Classification Approach for Data Leakage Prevention
    Alneyadi, Sultan
    Sithirasenan, Elankayer
    Muthukkumarasamy, Vallipuram
    INFORMATION SECURITY AND PRIVACY, ACISP 2014, 2014, 8544 : 413 - 421
  • [10] Toward semantics-aware annotation and retrieval of spatial data
    Cristiano Fugazza
    Earth Science Informatics, 2011, 4 : 225 - 239