Probability Density Machine: A New Solution of Class Imbalance Learning

被引:5
作者
Cheng, Ruihan [1 ]
Zhang, Longfei [1 ]
Wu, Shiqi [1 ]
Xu, Sen [2 ]
Gao, Shang [1 ,3 ]
Yu, Hualong [1 ,3 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang, Jiangsu, Peoples R China
[2] Yancheng Inst Technol, Sch Informat Technol, Yancheng, Peoples R China
[3] Sichuan Univ Sci & Engn, Artificial Intelligence Key Lab Sichuan Prov, Yibin, Peoples R China
基金
中国国家自然科学基金;
关键词
SUPPORT VECTOR MACHINE; SOIL CLASSES; CLASSIFICATION; ENSEMBLE; SMOTE;
D O I
10.1155/2021/7555587
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.
引用
收藏
页数:14
相关论文
共 51 条
[21]   A Novel Minority Cloning Technique for Cost-Sensitive Learning [J].
Jiang, Liangxiao ;
Qiu, Chen ;
Li, Chaoqun .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (04)
[22]   Cost-sensitive Bayesian network classifiers [J].
Jiang, Liangxiao ;
Li, Chaoqun ;
Wang, Shasha .
PATTERN RECOGNITION LETTERS, 2014, 45 :211-216
[23]   An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets [J].
Kovacs, Gyorgy .
APPLIED SOFT COMPUTING, 2019, 83
[24]   Learning from imbalanced data: open challenges and future directions [J].
Krawczyk B. .
Progress in Artificial Intelligence, 2016, 5 (04) :221-232
[25]   A novel random forest approach for imbalance problem in crime linkage [J].
Li, Yu-Sheng ;
Chi, Hong ;
Shao, Xue-Yan ;
Qi, Ming-Liang ;
Xu, Bao-Guang .
KNOWLEDGE-BASED SYSTEMS, 2020, 195
[26]   Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning [J].
Lim, Pin ;
Goh, Chi Keong ;
Tan, Kay Chen .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) :2850-2861
[27]   An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics [J].
Lopez, Victoria ;
Fernandez, Alberto ;
Garcia, Salvador ;
Palade, Vasile ;
Herrera, Francisco .
INFORMATION SCIENCES, 2013, 250 :113-141
[28]   Optimizing predictive precision in imbalanced datasets for actionable revenue change prediction [J].
Mahajan, Pravar Dilip ;
Maurya, Abhinav ;
Megahed, Aly ;
Elwany, Alaa ;
Strong, Ray ;
Blomberg, Jeanette .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 285 (03) :1095-1113
[29]   An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data [J].
Malhotra, Ruchika ;
Kamal, Shine .
NEUROCOMPUTING, 2019, 343 :120-140
[30]   A novel class imbalance-robust network for bearing fault diagnosis utilizing raw vibration signals [J].
Qian, Weiwei ;
Li, Shunming .
MEASUREMENT, 2020, 156