Generative learning for imbalanced data using the Gaussian mixed model

被引:20
作者
Xie, Yuxi [1 ]
Peng, Lizhi [1 ]
Chen, Zhenxiang [1 ]
Yang, Bo [1 ]
Zhang, Hongli [2 ]
Zhang, Haibo [3 ]
机构
[1] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250022, Shandong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150002, Heilongjiang, Peoples R China
[3] Univ Otago, Dept Compute Sci, Dunedin, New Zealand
基金
中国国家自然科学基金;
关键词
Imbalanced learning; Gaussian mixed model; Sample generation; NEURAL-NETWORKS; CLASSIFICATION; SMOTE; MACHINE; PERFORMANCE; ACCURACY; GMM;
D O I
10.1016/j.asoc.2019.03.056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification, an important type of classification task, is challenging for standard learning algorithms. There are different strategies to handle the problem, as popular imbalanced learning technologies, data level imbalanced learning methods have elicited ample attention from researchers in recent years. However, most data level approaches linearly generate new instances by using local neighbor information rather than based on overall data distribution. Differing from these algorithms, in this study, we develop a new data level method, namely, generative learning (GL), to deal with imbalanced problems. In GL, we fit the distribution of the original data and generate new data on the basis of the distribution by adopting the Gaussian mixed model. Generated data, including synthetic minority and majority classes, are used to train learning models. The proposed method is validated through experiments performed on real-world data sets. Results show that our approach is competitive and comparable with other methods, such as SMOTE, SMOTE-ENN, SMOTE-TomekLinks, Borderline-SMOTE, and safe-level-SMOTE. Wilcoxon signed rank test is applied, and the testing results show again the significant superiority of our proposal. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:439 / 451
页数:13
相关论文
共 54 条
  • [1] Alcala- Fdez J., 2011, J MULTVALUED LOGIC S, V17
  • [2] [Anonymous], 2011, J MACHINE LEARNING T
  • [3] Example-dependent cost-sensitive decision trees
    Bahnsen, Alejandro Correa
    Aouada, Djamila
    Ottersten, Bjoern
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (19) : 6609 - 6619
  • [4] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [5] Batista G.E.A.P.A., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
  • [6] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [7] Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
  • [8] Weighted Data Gravitation Classification for Standard and Imbalanced Data
    Cano, Alberto
    Zafra, Amelia
    Ventura, Sebastian
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) : 1672 - 1687
  • [9] Cha S, 2016, INT CONF CONTR AUTO, P1, DOI 10.1109/ICCAIS.2016.7822425
  • [10] Chaïri I, 2012, 2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), P259, DOI 10.1109/INTECH.2012.6457778