Prospect theory-based oversampling for software defect prediction

被引：0

作者：

Xu, Biao ^{[1
,2
]}

Yan, Yuanting ^{[1
,2
]}

Zhang, Yiwen ^{[1
,2
]}

机构：

[1] Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei

[2] School of Computer Science and Technology, Anhui University, Hefei

来源：

Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS | 2024年 / 30卷 / 08期

基金：

中国国家自然科学基金;

关键词：

class imbalance; data difficulty factors; oversampling; prospect theory; software defect prediction;

D O I：

10.13196/j.cims.2023.BPM06

中图分类号：

学科分类号：

摘要：

In software defect prediction, the data difficulty factors have a more significant impact on prediction performance than class imbalance.However, most existing oversampling methods ignore the data difficulty factors inherent in software project datasets when addressing the class imbalance problem, which leads to poor prediction performance. To solve above problems, a Prospect theory-based Over Sampling algorithm (POS) for software defect prediction was proposed, which evaluated the learning difficulty of minority samples by considering the influence of homogeneous and heterogeneous samples within the local neighborhood. To be specific, POS constructed homogeneous gains and heterogeneous losses to characterize the prospect value of minority samples via a gravity-based strategy, and strengthened heterogeneous losses to calculate the sampling weights of minority samples for reducing the risk of introducing data difficulty factors, improving the quality of synthetic samples, and further improving the prediction performance. Experimental results on the NASA datasets showed that POS outperformed the comparison algorithms in terms of performance metrics AUC, balance and G-mean. © 2024 CIMS. All rights reserved.

引用

页码：2822 / 2831

页数：9

共 27 条

[1] LIU Wangshu, CHEN Xiang, GU Qing, Et al., A noise tolerable feature selection framework for software defect prediction [J], Journal of Computers, 41, 3, pp. 506-520, (2018)
[2] YU Qiao, JIANG Shujuan, ZHANG Yanmei, Et al., The impact study of class imbalance on the preformance of software defect prediction models[J], Journal of Computers, 41, 4, pp. 809-824, (2018)
[3] REN Yanping, ZHENG Zhong, JIANG Yifei, Et al., Posterior probability and density-based imbalanced data undersampling [J^, Computer Engineering and Applications, 58, 23, pp. 268-277, (2022)
[4] KOZIARSKI M., Potential Anchoring for imbalanced data clas-sification, Pattern Recognition, 120, (2021)
[5] BRZEZINSK1D, MINKU L L, PEWINSKI T, Et al., The impact of data difficulty factors onclassification of imbalanced and concept drifting data streams, Knowledge and Information Systems, 63, 6, pp. 1429-1469, (2021)
[6] SUN Y, CAI L J, LIAO B, Et al., A robust oversampling approach for class imbalance problem with small disjuncts, IEEE Transactions on Knowledge and Data Engineering, 35, 6, pp. 5550-5562, (2023)
[7] VUTTIPITTAYAMONGKOL P, ELYAN E, PETROVSKI A., On the class overlap problem in imbalanced data classification [J], Knowledge-based systems, 212, (2021)
[8] LI J N, ZHU Q S, WU Q W, Et al., SMOTE-NaN-DE
[9] Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolu-tion[j], Knowledge-Based Systems, 223, (2021)
[10] ZHOU L G., Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods J], Knowledge-Based Systems, 41, pp. 16-25, (2013)

← 1 2 3 →