Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection
    Du, Xudong
    Li, Wei
    Ruan, Sumei
    Li, Li
    APPLIED SOFT COMPUTING, 2020, 97
  • [42] Neighborhood Approximate Reducts-Based Ensemble Learning Algorithm and Its Application in Software Defect Prediction
    Yang, Zhiyong
    Du, Junwei
    Hu, Qiang
    Jiang, Feng
    ROUGH SETS, IJCRS 2022, 2022, 13633 : 100 - 113
  • [43] A random approximate reduct-based ensemble learning approach and its application in software defect prediction
    Jiang, Feng
    Yu, Xu
    Gong, Dunwei
    Du, Junwei
    INFORMATION SCIENCES, 2022, 609 : 1147 - 1168
  • [44] Improved Shallow Landslide Susceptibility Prediction Based on Statistics and Ensemble Learning
    Liang, Zhu
    Liu, Wei
    Peng, Weiping
    Chen, Lingwei
    Wang, Changming
    SUSTAINABILITY, 2022, 14 (10)
  • [45] An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction
    Qiu, Shaojian
    Lu, Lu
    Jiang, Siyu
    Guo, Yang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (12)
  • [46] Two-stage stacking heterogeneous ensemble learning method for gasoline octane number loss prediction
    Cui, Shaoze
    Qiu, Huaxin
    Wang, Sutong
    Wang, Yanzhang
    APPLIED SOFT COMPUTING, 2021, 113
  • [47] Image Sparse Representation Based on Ensemble Learning in Compressed Sensing
    Bao, Donghai
    Wang, Qingpei
    Ding, Jiajun
    Li, Sheng
    He, Xiongxiong
    2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2017,
  • [48] MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction
    Wang, Ziyang
    Gu, Yaowen
    Zheng, Si
    Yang, Lin
    Li, Jiao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
  • [49] Electricity Theft Detection Based on Bagging Heterogeneous Ensemble Learning
    You W.
    Shen K.
    Yang N.
    Li Q.
    Wu Y.
    Li W.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2021, 45 (02): : 105 - 113
  • [50] A Stacking Heterogeneous Ensemble Learning Method for the Prediction of Building Construction Project Costs
    Park, Uyeol
    Kang, Yunho
    Lee, Haneul
    Yun, Seokheon
    APPLIED SCIENCES-BASEL, 2022, 12 (19):