共 1 条
MAHAKIL: Diversity based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction Extended Abstract
被引:11
作者:
Bennin, Kwabena E.
[1
]
Keung, Jacky
[1
]
Phannachitta, Passakorn
[2
]
Monden, Akito
[3
]
Mensah, Solomon
[1
]
机构:
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Chiang Mai Univ, Coll Arts Media & Technol, Chiang Mai, Thailand
[3] Okayama Univ, Grad Sch Nat Sci & Technol, Okayama, Japan
来源:
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE)
|
2018年
关键词:
Software defect prediction;
Class imbalance learning;
Synthetic sample generation;
Data sampling methods;
Classification problems;
D O I:
10.1145/3180155.3182520
中图分类号:
TP31 [计算机软件];
学科分类号:
081202 ;
0835 ;
摘要:
This study presents MAHAKIL, a novel and efficient synthetic oversampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution. We extensively compare MAHAKIL with five other sampling approaches using 20 releases of defect datasets from the PROMISE repository and five prediction models. Our experiments indicate that MAHAKIL improves the prediction performance for all the models and achieves better and more significant pf values than the other oversampling approaches, based on robust statistical tests.
引用
收藏
页码:699 / 699
页数:1
相关论文