Improving the undersampling technique by optimizing the termination condition for software defect prediction

被引：15

作者：

Feng, Shuo ^{[1
]}

Keung, Jacky ^{[2
]}

Xiao, Yan ^{[3
,4
]}

Zhang, Peichang ^{[5
]}

Yu, Xiao ^{[6
]}

Cao, Xiaochun ^{[3
]}

机构：

[1] Zhengzhou Univ, Sch Comp & Artificial Intelligence, Zhengzhou, Peoples R China

[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[3] Sun Yat Sen Univ, Sch Cyber Sci & Technol, Shenzhen, Peoples R China

[4] Natl Univ Singapore, Sch Comp, Singapore, Singapore

[5] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen, Peoples R China

[6] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 235卷

关键词：

Software defect prediction; Class imbalance; Learning-to-rank; Undersampling; Oversampling; Data resampling; DIFFERENTIAL EVOLUTION; EFFECT SIZE; SMOTE; QUALITY; METRICS; FAULTS;

D O I：

10.1016/j.eswa.2023.121084

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The class imbalance problem significantly hinders the ability of the software defect prediction (SDP) models to distinguish between defective (minority class) and non-defective (majority class) software instances. Recent studies on the data resampling technique have shown that Random UnderSampling (RUS) is more effective than several complex oversampling techniques at alleviating this problem. However, RUS blindly removes majority class instances, leading to significant information loss. These studies have also pointed out that the conventional termination condition (i.e., terminating the data resampling technique when the number of instances for both the minority and majority classes are the same) of the data resampling technique can result in suboptimal performance. In fact, the undersampling technique can be likened to a recommender system or a web search engine that recommends majority class instances to SDP models. Therefore, we propose the Learning-To-Rank Undersampling technique (LTRUS). Our work is novel in two aspects: (1) We consider the undersampling process as a learning-to-rank task, optimizing a linear model to rank majority class instances and remove them from the bottom of the rank to alleviate the class imbalance problem. (2) We propose two termination conditions for the undersampling technique, which differ from the conventional termination condition. LTRUS significantly outperforms RUS, the clustering-based undersampling technique, the complexity-based oversampling technique, SMOTUNED, and Borderline-SMOTE in terms of F-measure, AUC, and MCC by 8.9%, 7.6%, and 18.0% on average under the conventional termination condition. Furthermore, LTRUS under the two termination conditions we propose yield similar performance, and both outperform LTRUS and all the other baselines under the conventional termination condition. The experimental results demonstrate the effectiveness of LTRUS and indicate that the conventional termination condition for the data resampling technique is improper.

引用

页数：13

共 50 条

[1] Improving Software Defect Prediction in Noisy Imbalanced Datasets
Shi, Haoxiang
Ai, Jun
Liu, Jingyu
Xu, Jiaxi
APPLIED SCIENCES-BASEL, 2023, 13 (18):
[2] A novel modified undersampling (MUS) technique for software defect prediction
Lingden, P.
Alsadoon, Abeer
Prasad, P. W. C.
Alsadoon, Omar Hisham
Ali, Rasha S.
Vinh Tran Quoc Nguyen
COMPUTATIONAL INTELLIGENCE, 2019, 35 (04) : 1003 - 1020
[3] COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction
Feng, Shuo
Keung, Jacky
Yu, Xiao
Xiao, Yan
Bennin, Kwabena Ebo
Kabir, Md Alamgir
Zhang, Miao
INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 129
[4] Improving Performance in Software Defect Prediction Using Variational Autoencoder
Eivazpour, Z.
Keyvanpour, Mohammad Reza
2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 644 - 649
[5] Software defect prediction oversampling technique with generalization and difficulty-aware
Fan, Hongqi
Yan, Yuanting
Zhang, Yiwen
Zhang, Yanping
Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2024, 30 (08): : 2663 - 2671
[6] Holistic Parameter Optimization for Software Defect Prediction
Lee, Jaewook
Choi, Jiwon
Ryu, Duksan
Kim, Suntae
IEEE ACCESS, 2022, 10 : 106781 - 106797
[7] Improving Recall of software defect prediction models using association mining
Rana, Zeeshan Ali
Mian, M. Awais
Shamail, Shafay
KNOWLEDGE-BASED SYSTEMS, 2015, 90 : 1 - 13
[8] Improving Software Defect Prediction by Aggregated Change Metrics
Sikic, Lucija
Afric, Petar
Kurdija, Adrian Satja
Silic, Marin
IEEE ACCESS, 2021, 9 : 19391 - 19411
[9] SAGA: A Hybrid Technique to handle Imbalance Data in Software Defect Prediction
Malhotra, Ruchika
Kapoor, Ritvik
Saxena, Paridhi
Sharma, Parth
11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021), 2021, : 331 - 336
[10] Credibility Based Imbalance Boosting Method for Software Defect Proneness Prediction
Tong, Haonan
Wang, Shihai
Li, Guangling
APPLIED SCIENCES-BASEL, 2020, 10 (22): : 1 - 29

← 1 2 3 4 5 →