Semi-supervised Software Defect Prediction Model Based on Tri-training

被引:54
作者
Meng, Fanqi [1 ,2 ]
Cheng, Wenying [1 ]
Wang, Jingdong [1 ]
机构
[1] Northeast Elect Power Univ, Sch Comp Sci, Jilin 132012, Jilin, Peoples R China
[2] Guangdong Atv Acad Performing Arts, Dongguan 523710, Guangdong, Peoples R China
来源
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2021年 / 15卷 / 11期
关键词
Feature Normalization; Oversampling Techniques; Software Defect Prediction; Semi-supervised Learning; Unbalanced Classification; QUALITY;
D O I
10.3837/tiis.2021.11.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.
引用
收藏
页码:4028 / 4042
页数:15
相关论文
共 27 条
  • [1] An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction
    Abaei, Golnoush
    Selamat, Ali
    Fujita, Hamido
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 74 : 28 - 39
  • [2] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [3] [蔡亮 Cai Liang], 2019, [软件学报, Journal of Software], V30, P1288
  • [4] Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction
    Catal, Cagatay
    Diri, Banu
    [J]. EXPERT SYSTEMS, 2009, 26 (05) : 458 - 471
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] 静态软件缺陷预测方法研究
    陈翔
    顾庆
    刘望舒
    刘树龙
    倪超
    [J]. 软件学报, 2016, 27 (01) : 1 - 25
  • [7] Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction
    Feng, Shuo
    Keung, Jacky
    Yu, Xiao
    Xiao, Yan
    Zhang, Miao
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 139
  • [8] Gong Li-Na, 2019, Journal of Software, V30, P3090, DOI 10.13328/j.cnki.jos.005790
  • [9] Halstead MH., 1977, Elements of Software Science (Operating and Programming Systems Series)
  • [10] Software Defect Detection with Rocus
    Jiang, Yuan
    Li, Ming
    Zhou, Zhi-Hua
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2011, 26 (02) : 328 - 342