A design of information granule-based under-sampling method in imbalanced data classification

被引:0
|
作者
Tianyu Liu
Xiubin Zhu
Witold Pedrycz
Zhiwu Li
机构
[1] Xidian University,School of Electro
[2] University of Alberta,Mechanical Engineering
[3] Macau University of Science and Technology,Department of Electrical and Computer Engineering
[4] King Abdulaziz University,Institute of Systems Engineering
[5] Guilin University of Electronic Technology,Faculty of Engineering
来源
Soft Computing | 2020年 / 24卷
关键词
Imbalanced data; Information granule; Support vector machine (SVM); -nearest-neighbor (KNN); Under-sampling;
D O I
暂无
中图分类号
学科分类号
摘要
In numerous real-world problems, we are faced with difficulties in learning from imbalanced data. The classification performance of a “standard” classifier (learning algorithm) is evidently hindered by the imbalanced distribution of data. The over-sampling and under-sampling methods have been researched extensively with the aim to increase the predication accuracy over the minority class. However, traditional under-sampling methods tend to ignore important characteristics pertinent to the majority class. In this paper, a novel under-sampling method based on information granules is proposed. The method exploits the concepts and algorithms of granular computing. First, information granules are built around the selected patterns coming from the majority class to capture the essence of the data belonging to this class. In the sequel, the resultant information granules are evaluated in terms of their quality and those with the highest specificity values are selected. Next, the selected numeric data are augmented by some weights implied by the size of information granules. Finally, a support vector machine and a K-nearest-neighbor classifier, both being regarded here as representative classifiers, are built based on the weighted data. Experimental studies are carried out using synthetic data as well as a suite of imbalanced data sets coming from the public machine learning repositories. The experimental results quantify the performance of support vector machine and K-nearest-neighbor with under-sampling method based on information granules. The results demonstrate the superiority of the performance obtained for these classifiers endowed with conventional under-sampling method. In general, the improvement of performance expressed in terms of G-means is over 10% when applying information granule under-sampling compared with random under-sampling.
引用
收藏
页码:17333 / 17347
页数:14
相关论文
共 50 条
  • [1] A design of information granule-based under-sampling method in imbalanced data classification
    Liu, Tianyu
    Zhu, Xiubin
    Pedrycz, Witold
    Li, Zhiwu
    SOFT COMPUTING, 2020, 24 (22) : 17333 - 17347
  • [2] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [3] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [4] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    Frontiers of Computer Science, 2018, 12 : 331 - 350
  • [5] An Under-sampling Imbalanced Learning of Data Gravitation Based Classification
    Peng, Lizhi
    Yang, Bo
    Chen, Yuehui
    Zhou, Xiaoqing
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 419 - 425
  • [6] An Active Under-sampling Approach for Imbalanced Data Classification
    Yang, Zeping
    Gao, Daqi
    2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 2, 2012, : 270 - 273
  • [7] Under-sampling method based on sample weight for imbalanced data
    Xiong B.
    Wang G.
    Deng W.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2016, 53 (11): : 2613 - 2622
  • [8] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [9] An Improved Under-sampling Imbalanced Classification Algorithm
    Yao, Baofeng
    Wang, Lei
    2021 13TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2021), 2021, : 775 - 779
  • [10] A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data
    Popel, Mahmudul Hasan
    Hasib, Khan Md
    Habib, Syed Ahsan
    Shah, Faisal Muhammad
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,