Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Heterogeneous Defect Prediction through Multiple Kernel Learning and Ensemble Learning
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    Zhang, Hongyu
    2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 91 - 102
  • [2] Heterogeneous defect prediction with two-stage ensemble learning
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    Zhang, Hongyu
    Xu, Baowen
    Ying, Shi
    AUTOMATED SOFTWARE ENGINEERING, 2019, 26 (03) : 599 - 651
  • [3] Kernel Spectral Embedding Transfer Ensemble for Heterogeneous Defect Prediction
    Tong, Haonan
    Liu, Bin
    Wang, Shihai
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1886 - 1906
  • [4] Heterogeneous Defect Prediction Using Ensemble Learning Technique
    Ansari, Arsalan Ahmed
    Iqbal, Amaan
    Sahoo, Bibhudatta
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, 2020, 1056 : 283 - 293
  • [5] Heterogeneous Defect Prediction through Joint Metric Selection and Matching
    Chen, Haowen
    Jing, Xiao-Yuan
    Xu, Baowen
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 367 - 377
  • [6] Heterogeneous defect prediction with two-stage ensemble learning
    Zhiqiang Li
    Xiao-Yuan Jing
    Xiaoke Zhu
    Hongyu Zhang
    Baowen Xu
    Shi Ying
    Automated Software Engineering, 2019, 26 : 599 - 651
  • [7] Few-Shot Learning Based Balanced Distribution Adaptation for Heterogeneous Defect Prediction
    Wang, Aili
    Zhang, Yutong
    Wu, Haibin
    Jiang, Kaiyuan
    Wang, Minhui
    IEEE ACCESS, 2020, 8 : 32989 - 33001
  • [8] Heterogeneous Defect Prediction through Correlation-Based Selection of Multiple Source Projects and Ensemble Learning
    Kim, Eunseob
    Baik, Jongmoon
    Ryu, Duksan
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 503 - 513
  • [9] Ensemble learning based software defect prediction
    Dong, Xin
    Liang, Yan
    Miyamoto, Shoichiro
    Yamaguchi, Shingo
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (04): : 377 - 391
  • [10] Multiple kernel ensemble learning for software defect prediction
    Wang, Tiejian
    Zhang, Zhiwu
    Jing, Xiaoyuan
    Zhang, Liqiang
    AUTOMATED SOFTWARE ENGINEERING, 2016, 23 (04) : 569 - 590