An ensemble learning algorithm with Gaussian-based oversampling

被引:0
|
作者
Zhang Z. [1 ,2 ]
Chen Y. [1 ]
Tang J. [1 ]
Luo X. [1 ]
机构
[1] College of Management, Hangzhou Dianzi University, Hangzhou
[2] Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai
基金
中国国家自然科学基金;
关键词
Classification algorithm; Data mining; Ensemble learning; Imbalanced data; SMOTE;
D O I
10.12011/SETP2020-1790
中图分类号
学科分类号
摘要
The class imbalance learning widely occurs in classification tasks in the research field of data mining, such as manufacturing quality conditions, medical diagnosis, financial service, etc. The synthetic minority over-sampling technique (SMOTE) is a common technique to deal with imbalanced datasets, which can be enhanced using the framework of the boosting algorithm. However, this strategy can easily result in the lack of diversity of the base classifiers in the ensemble learning system. On this account, a boosting learning algorithm integrated Gaussian process smote oversampling is proposed to solve the imbalance learning problem, namely Gaussian-based smote in boosting (GSMOTEBoost). In order to improve the robustness of the classification system, the proposed GSMOTEBoost algorithm is developed using the framework of AdaBoost, in which a smote oversampling technology based on Gaussian process is used to increase the diversity of the base classifiers for each iteration. To verify the effectiveness of our algorithm, we develop the experiments on twenty datasets selected from the KEEL repository with these well-known imbalance learning algorithms. The G-mean, F-measure and AUC are considered as the assessment metrics and the hypothesis testing is used to analyze the experimental results. The obtained results, supported by the proper statistical analysis, indicate that the proposed GSMOTEBoost significantly outperforms the comparison methods. © 2021, Editorial Board of Journal of Systems Engineering Society of China. All right reserved.
引用
收藏
页码:513 / 523
页数:10
相关论文
共 25 条
  • [1] Kim A, Oh K, Jung J, Et al., Imbalanced classification of manufacturing quality conditions using cost-sensitive decision tree ensembles, International Journal of Computer Integrated Manufacturing, 31, 8, pp. 701-717, (2017)
  • [2] Makki S, Assaghir Z, Taher Y, Et al., An experimental study with imbalanced classification approaches for credit card fraud detection, IEEE Access, 7, pp. 93010-93022, (2019)
  • [3] Zhan H, Zhang H, Pirbhulal S, Et al., Active balancing mechanism for imbalanced medical data in deep learningbased classification models, ACM Transactions on Multimedia Computing, Communications, and Applications, 16, 1S, (2020)
  • [4] Guo H X, Gu M Y, Li Y J, Et al., An adaptive multiple classifier system based on differential evolution and its application in imbalanced data classification, Systems Engineering-Theory & Practice, 38, 5, pp. 1284-1299, (2018)
  • [5] Zhang Z, Krawczyk B, Garcia S, Et al., Empowering one-vs-one decomposition with ensemble learning for multiclass imbalanced data, Knowledge-Based Systems, pp. 251-263, (2016)
  • [6] Chawla N V, Bowyer K W, Hall L O, Et al., SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 1, pp. 321-357, (2002)
  • [7] Zhu T, Lin Y, Liu Y., Improving interpolation-based oversampling for imbalanced data learning, KnowledgeBased Systems, 187, (2020)
  • [8] Chawla N V, Lazarevic A, Hall L O, Et al., SMOTEBoost: Improving prediction of the minority class in boosting, European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107-119, (2003)
  • [9] Wang S, Yao X., Relationships between diversity of classification ensembles and single-class performance measures, IEEE Transactions on Knowledge and Data Engineering, 25, 1, pp. 206-219, (2013)
  • [10] Freund Y, Schapire R E., A decision-theoretic generalization of on-line learning and an application to boosting, Conference on Learning Theory, 55, 1, pp. 119-139, (1997)