Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Prediction Algorithm for Software Defect Series Based on Nonlinear Weighted Ensemble Learning
    Jia X.
    Fan S.
    Luo X.
    Zhu X.
    1600, Xi'an Jiaotong University (51): : 156 - 161
  • [22] Multiple kernel ensemble learning for software defect prediction
    Tiejian Wang
    Zhiwu Zhang
    Xiaoyuan Jing
    Liqiang Zhang
    Automated Software Engineering, 2016, 23 : 569 - 590
  • [23] Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning
    Zhang, Tianhang
    Du, Qingfeng
    Xu, Jincheng
    Li, Jiechu
    Li, Xiaojun
    2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 81 - 90
  • [24] Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning
    Xu, Zhou
    Pang, Shuai
    Zhang, Tao
    Luo, Xia-Pu
    Liu, Jin
    Tang, Yu-Tian
    Yu, Xiao
    Xue, Lei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (05) : 1039 - 1062
  • [25] A software defect prediction method with metric compensation based on feature selection and transfer learning
    Chen, Jinfu
    Wang, Xiaoli
    Cai, Saihua
    Xu, Jiaping
    Chen, Jingyi
    Chen, Haibo
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (05) : 715 - 731
  • [26] Dictionary Learning Based Software Defect Prediction
    Jing, Xiao-Yuan
    Ying, Shi
    Zhang, Zhi-Wu
    Wu, Shan-Shan
    Liu, Jin
    36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 414 - 423
  • [27] Deep learning based software defect prediction
    Qiao, Lei
    Li, Xuesong
    Umer, Qasim
    Guo, Ping
    NEUROCOMPUTING, 2020, 385 : 100 - 110
  • [28] Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning
    Lu, Yao
    Wang, Kui
    Sun, Hui
    Qu, Hanwen
    Chen, Jiajia
    Liu, Wei
    Chang, Chenjie
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [29] Unsupervised representation learning based on the deep multi-view ensemble learning
    Koohzadi, Maryam
    Charkari, Nasrollah Moghadam
    Ghaderi, Foad
    APPLIED INTELLIGENCE, 2020, 50 (02) : 562 - 581
  • [30] A Novel Tracking Method Based on Ensemble Metric Learning
    Huo, Qirun
    Lu, Yao
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 176 - 179