Semi-supervised Software Defect Prediction Model Based on Tri-training

被引：54

作者：

Meng, Fanqi ^{[1
,2
]}

Cheng, Wenying ^{[1
]}

Wang, Jingdong ^{[1
]}

机构：

[1] Northeast Elect Power Univ, Sch Comp Sci, Jilin 132012, Jilin, Peoples R China

[2] Guangdong Atv Acad Performing Arts, Dongguan 523710, Guangdong, Peoples R China

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2021年 / 15卷 / 11期

关键词：

Feature Normalization; Oversampling Techniques; Software Defect Prediction; Semi-supervised Learning; Unbalanced Classification; QUALITY;

D O I：

10.3837/tiis.2021.11.009

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

引用

页码：4028 / 4042

页数：15

共 27 条

[1] An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction
Abaei, Golnoush
Selamat, Ali
Fujita, Hamido
[J]. KNOWLEDGE-BASED SYSTEMS, 2015, 74 : 28 - 39
[2] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[3] [蔡亮 Cai Liang], 2019, [软件学报, Journal of Software], V30, P1288
[4] Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction
Catal, Cagatay
Diri, Banu
[J]. EXPERT SYSTEMS, 2009, 26 (05) : 458 - 471
[5] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[6] 静态软件缺陷预测方法研究
陈翔
顾庆
刘望舒
刘树龙
倪超
[J]. 软件学报, 2016, 27 (01) : 1 - 25
[7] Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction
Feng, Shuo
Keung, Jacky
Yu, Xiao
Xiao, Yan
Zhang, Miao
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 139
[8] Gong Li-Na, 2019, Journal of Software, V30, P3090, DOI 10.13328/j.cnki.jos.005790
[9] Halstead MH., 1977, Elements of Software Science (Operating and Programming Systems Series)
[10] Software Defect Detection with Rocus
Jiang, Yuan
Li, Ming
Zhou, Zhi-Hua
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2011, 26 (02) : 328 - 342

← 1 2 3 →