Probability Density Machine: A New Solution of Class Imbalance Learning

被引:5
作者
Cheng, Ruihan [1 ]
Zhang, Longfei [1 ]
Wu, Shiqi [1 ]
Xu, Sen [2 ]
Gao, Shang [1 ,3 ]
Yu, Hualong [1 ,3 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang, Jiangsu, Peoples R China
[2] Yancheng Inst Technol, Sch Informat Technol, Yancheng, Peoples R China
[3] Sichuan Univ Sci & Engn, Artificial Intelligence Key Lab Sichuan Prov, Yibin, Peoples R China
基金
中国国家自然科学基金;
关键词
SUPPORT VECTOR MACHINE; SOIL CLASSES; CLASSIFICATION; ENSEMBLE; SMOTE;
D O I
10.1155/2021/7555587
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.
引用
收藏
页数:14
相关论文
共 51 条
[1]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[2]  
Batista G.E.A.P.A., 2004, ACM SIGKDD Explor. Newsl, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
[3]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[4]  
Blake C., 1998, 213 U CALIFORNIA DEP
[5]   Robust twin bounded support vector machines for outliers and imbalanced data [J].
Borah, Parashjyoti ;
Gupta, Deepak .
APPLIED INTELLIGENCE, 2021, 51 (08) :5314-5343
[6]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   SVM-tree and SVM-forest algorithms for imbalanced fault classification in industrial processes [J].
Chen, Gecheng ;
Ge, Zhiqiang .
IFAC JOURNAL OF SYSTEMS AND CONTROL, 2019, 8 :100052
[9]   Classifying adverse drug reactions from imbalanced twitter data [J].
Dai, Hong-Jie ;
Wang, Chen-Kai .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 (122-132) :122-132
[10]  
Demsar J, 2006, J MACH LEARN RES, V7, P1