Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Class Imbalance Learning to Heterogeneous Cross-Software Projects Defect Prediction
    Vashisht, Rohit
    Rizvi, Syed Afzal Murtaza
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [32] Repurchase Prediction Based on Ensemble Learning
    Xu, Danqi
    Yang, Wenyin
    Ma, Li
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1317 - 1322
  • [33] Ensemble learning and deep learning-based defect detection in power generation plants
    Atemkeng, Marcellin
    Osanyindoro, Victor
    Rockefeller, Rockefeller
    Hamlomo, Sisipho
    Mulongo, Jecinta
    Ansah-Narh, Theophilus
    Tchakounte, Franklin
    Fadja, Arnaud Nguembang
    JOURNAL OF INTELLIGENT SYSTEMS, 2024, 33 (01)
  • [34] With-in-project defect prediction using bootstrap aggregation based diverse ensemble learning technique
    Bhutamapuram, Umamaheswara Sharma
    Sadam, Ravichandra
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 8675 - 8691
  • [35] Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm
    Yu Tang
    Qi Dai
    Mengyuan Yang
    Tony Du
    Lifang Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1967 - 1987
  • [36] LDFR: Learning deep feature representation for software defect prediction
    Xu, Zhou
    Li, Shuai
    Xu, Jun
    Liu, Jin
    Luo, Xiapu
    Zhang, Yifeng
    Zhang, Tao
    Keung, Jacky
    Tang, Yutian
    JOURNAL OF SYSTEMS AND SOFTWARE, 2019, 158
  • [37] Ensemble learning-based approach for residential building heating energy prediction and optimization
    Zhang, Jianxin
    Huang, Yao
    Cheng, Hengda
    Chen, Huanxin
    Xing, Lu
    He, Yuxuan
    JOURNAL OF BUILDING ENGINEERING, 2023, 67
  • [38] A Hierarchical Feature Ensemble Deep Learning Approach for Software Defect Prediction
    Zhang, Shenggang
    Jiang, Shujuan
    Yan, Yue
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (04) : 543 - 573
  • [39] TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
    Yang, Xinli
    Lo, David
    Xia, Xin
    Sun, Jianling
    INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 87 : 206 - 220
  • [40] IVKMP: A robust data-driven heterogeneous defect model based on deep representation optimization learning
    Zhu, Kun
    Ying, Shi
    Ding, Weiping
    Zhang, Nana
    Zhu, Dandan
    INFORMATION SCIENCES, 2022, 583 : 332 - 363