Neighbor cleaning learning based cost-sensitive ensemble learning approach for software defect prediction

被引:2
作者
Li, Li [1 ]
Su, Renjia [1 ]
Zhao, Xin [1 ]
机构
[1] Northeast Forestry Univ, Sch Comp & Control Engn, Harbin, Peoples R China
关键词
class imbalance; class overlap; cost-sensitive learning; machine learning; software defect prediction;
D O I
10.1002/cpe.8017
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The class imbalance problem in software defect prediction datasets leads to prediction results that are biased toward the majority class, and the class overlap problem leads to fuzzy boundaries for classification decisions, both of which affect the model's prediction performance on the dataset. A neighbor cleaning learning (NCL) is an effective technique for defect prediction. To solve the class overlap problem and class imbalance problem, the NCL-based cost-sensitive ensemble learning approach for software defect prediction (NCL_CSEL) model is proposed. First, the bootstrap resampled data are trained using the base classifier. Subsequently, multiple classifiers are integrated by a static ensemble to obtain the final classification results. As the base classifier, the Adaptive Boosting (AdaBoost) classifier combining NCL and cost-sensitive learning is proposed, and the class overlap problem and class imbalance problem are solved by balancing the proportion of overlap sample removal in NCL and the size of the cost factor in cost-sensitive learning. Specifically, the NCL algorithm is used to initialize the sample weights, while the cost-sensitive method is employed to update the sample weights. Experiments based on the NASA dataset and AEEEM dataset show that the defect prediction model can improve the bal value by approximately 7% and the AUC value by 9.5% when the NCL algorithm is added. NCL_CSEL can effectively solve the class imbalance problem and significantly improve the prediction performance compared with existing methods for solving the class imbalance problem.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Cost-Sensitive Learning Vector Quantization for Financial Distress Prediction
    Chen, Ning
    Vieira, Armando S.
    Duarte, Joao
    Ribeiro, Bernardete
    Neves, Joao C.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5816 : 374 - +
  • [42] Bootstrap aggregation ensemble learning-based reliable approach for software defect prediction by using characterized code feature
    P. Suresh Kumar
    H. S. Behera
    Janmenjoy Nayak
    Bighnaraj Naik
    Innovations in Systems and Software Engineering, 2021, 17 : 355 - 379
  • [43] Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1806 - 1817
  • [44] Software Defect Prediction Based on Fuzzy Cost Broad Learning System
    Cao, Heling
    Cui, Zhiying
    Chu, Yonghe
    Gong, Lina
    Liu, Guangen
    Wang, Yun
    Tian, Fangchao
    Li, Peng
    Ge, Haoyang
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2025, 2025 (01)
  • [45] An intelligent model for early kick detection based on cost-sensitive learning
    Peng, Chi
    Li, Qingfeng
    Fu, Jianhong
    Yang, Yun
    Zhang, Xiaomin
    Su, Yu
    Xu, Zhaoyang
    Zhong, Chengxu
    Wu, Pengcheng
    PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2023, 169 : 398 - 417
  • [46] Active Learning for Cost-Sensitive Classification
    Krishnamurthy, Akshay
    Agarwal, Alekh
    Huang, Tzu-Kuo
    Daume, Hal, III
    Langford, John
    JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [47] Search-based cost-sensitive hypergraph learning for anomaly detection
    Wang, Nan
    Zhang, Yubo
    Zhao, Xibin
    Zheng, Yingli
    Fan, Hao
    Zhou, Boya
    Gao, Yue
    INFORMATION SCIENCES, 2022, 617 : 451 - 463
  • [48] Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning
    Zhang, Tianhang
    Du, Qingfeng
    Xu, Jincheng
    Li, Jiechu
    Li, Xiaojun
    2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 81 - 90
  • [49] Open source software classification using cost-sensitive multi-label learning
    Han, Le
    Li, Ming
    Ruan Jian Xue Bao/Journal of Software, 2014, 25 (09): : 1982 - 1991
  • [50] Power System Transient Stability Assessment Based on Dimension Reduction and Cost-Sensitive Ensemble Learning
    Hang, Fan
    Huang, Shaowei
    Chen, Ying
    Mei, Shengwei
    2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,