A light-weight data augmentation method for fault localization

被引:9
作者
Hu, Jian [1 ]
Xie, Huan [1 ]
Lei, Yan [1 ]
Yu, Ke [1 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Fault localization; Feature selection; Data augmentation; EFFICIENT; CLONING;
D O I
10.1016/j.infsof.2023.107148
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Fault localization (FL) is essentially a search over the space of program statements to find suspicious entities that might have caused a program failure. However, the input data is high-dimensional and extremely imbalanced since the real-world programs are large in size and the number of failing test cases is much less than that of passing test cases, which limits the effectiveness and efficiency of existing FL methods. The state-of-the-art FL method (Aeneas) solves the imbalanced and high-dimensional problem but in a complex and time-consuming process.Objective: Due to the limited effectiveness of original FL methods and the low efficiency of Aeneas, this paper proposes Lamont, a Light-weight data augmentation method to improve the effectiveness of original FL methods and the efficiency of Aeneas.Methods: Lamont uses revised linear discriminant analysis (LDA) to reduce the dimensionality of the original coverage matrix and leverage synthetic minority over-sampling (SMOTE) to generate the synthesized failing tests. The balanced coverage matrix with reduced dimensionality is fed into FL methods to obtain the ranked suspicious list of statements. To evaluate the efficiency and effectiveness, we compare Lamont with six representative FL methods and Aeneas on 458 versions of 10 real-life programs.Results: It can be observed that Lamont outperforms in most cases for Top-K metric and reduces the number of statements that need to be checked from 17.45% to 79.81% compared with the original six FL methods. Furthermore, Lamont saves the time over the state-of-the-art data augmentation method Aeneas from 55.33% to 68.39% with comparable effectiveness.Conclusion: This work conducts a large-scale experimental study to investigate the effectiveness and efficiency of Lamont. Two conclusions can be obtained based on the experimental results. First, it shows that Lamont is more effective than the original FL methods. Second, it shows Lamont is more efficient than Aeneas with similar effectiveness in six FL methods.
引用
收藏
页数:12
相关论文
共 57 条
[1]   Spectrum-based Multiple Fault Localization [J].
Abreu, Rui ;
Zoeteweij, Peter ;
van Gemund, Arjan J. C. .
2009 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :88-99
[2]   A practical evaluation of spectrum-based fault localization [J].
Abreu, Rui ;
Zoeteweij, Peter ;
Golsteijn, Rob ;
van Gemund, Arjan J. C. .
JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (11) :1780-1792
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]   Pinpoint: Problem determination in large, dynamic Internet services [J].
Chen, MY ;
Kiciman, E ;
Fratkin, E ;
Fox, A ;
Brewer, E .
INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2002, :595-604
[5]  
Cheng Gong, 2012, Proceedings of the 2012 IEEE 36th IEEE Annual Computer Software and Applications Conference Workshops (COMPSACW), P470, DOI 10.1109/COMPSACW.2012.89
[6]  
Debroy V., 2010, Proceedings of the Tenth International Conference on Quality Software (QSIC 2010), P13, DOI 10.1109/QSIC.2010.80
[7]  
Duda RO., 1973, Pattern classification and scene analysis
[8]  
Fengxi Song, 2010, Proceedings of the 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization (ICSEM 2010), P27, DOI 10.1109/ICSEM.2010.14
[9]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[10]   REGULARIZED DISCRIMINANT-ANALYSIS [J].
FRIEDMAN, JH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (405) :165-175