Generative learning for imbalanced data using the Gaussian mixed model

被引：20

作者：

Xie, Yuxi ^{[1
]}

Peng, Lizhi ^{[1
]}

Chen, Zhenxiang ^{[1
]}

Yang, Bo ^{[1
]}

Zhang, Hongli ^{[2
]}

Zhang, Haibo ^{[3
]}

机构：

[1] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250022, Shandong, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150002, Heilongjiang, Peoples R China

[3] Univ Otago, Dept Compute Sci, Dunedin, New Zealand

来源：

APPLIED SOFT COMPUTING | 2019年 / 79卷

基金：

中国国家自然科学基金;

关键词：

Imbalanced learning; Gaussian mixed model; Sample generation; NEURAL-NETWORKS; CLASSIFICATION; SMOTE; MACHINE; PERFORMANCE; ACCURACY; GMM;

D O I：

10.1016/j.asoc.2019.03.056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Imbalanced data classification, an important type of classification task, is challenging for standard learning algorithms. There are different strategies to handle the problem, as popular imbalanced learning technologies, data level imbalanced learning methods have elicited ample attention from researchers in recent years. However, most data level approaches linearly generate new instances by using local neighbor information rather than based on overall data distribution. Differing from these algorithms, in this study, we develop a new data level method, namely, generative learning (GL), to deal with imbalanced problems. In GL, we fit the distribution of the original data and generate new data on the basis of the distribution by adopting the Gaussian mixed model. Generated data, including synthetic minority and majority classes, are used to train learning models. The proposed method is validated through experiments performed on real-world data sets. Results show that our approach is competitive and comparable with other methods, such as SMOTE, SMOTE-ENN, SMOTE-TomekLinks, Borderline-SMOTE, and safe-level-SMOTE. Wilcoxon signed rank test is applied, and the testing results show again the significant superiority of our proposal. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：439 / 451

页数：13

共 54 条

[1] Alcala- Fdez J., 2011, J MULTVALUED LOGIC S, V17
[2] [Anonymous], 2011, J MACHINE LEARNING T
[3] Example-dependent cost-sensitive decision trees
Bahnsen, Alejandro Correa
Aouada, Djamila
Ottersten, Bjoern
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (19) : 6609 - 6619
[4] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
Barua, Sukarna
Islam, Md. Monirul
Yao, Xin
Murase, Kazuyuki
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
[5] Batista G.E.A.P.A., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
[6] The use of the area under the roc curve in the evaluation of machine learning algorithms
Bradley, AP
[J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
[7] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[8] Weighted Data Gravitation Classification for Standard and Imbalanced Data
Cano, Alberto
Zafra, Amelia
Ventura, Sebastian
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) : 1672 - 1687
[9] Cha S, 2016, INT CONF CONTR AUTO, P1, DOI 10.1109/ICCAIS.2016.7822425
[10] Chaïri I, 2012, 2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), P259, DOI 10.1109/INTECH.2012.6457778

← 1 2 3 4 5 6 →