Subspace-based minority oversampling for imbalance classification

被引:21
作者
Li, Tianjun [1 ,2 ]
Wang, Yingxu [3 ]
Liu, Licheng [5 ]
Chen, Long [4 ]
Chen, C. L. Philip [1 ,2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Brain & Affect Cognit Res Ctr, Pazhou Lab, Guangzhou 510335, Peoples R China
[3] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent Co, Jinan 250022, Peoples R China
[4] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Taipa, Macau, Peoples R China
[5] Hunan Univ, Dept Elect & Informat Engn, Hunan 410082, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; Minority over-sampling; Low-rank representation; Matrix completion; SMOTE; SVM;
D O I
10.1016/j.ins.2022.11.108
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In pattern classification, the class imbalance problem always occurs when the number of observations in some classes is significantly different from that of other categories, which leads to the learning bias in the classifiers. One possible solution to this problem is to re-balance the training set by over-sampling the minority class. However, over-samplings always push the classification boundaries to the majority part, thus the recall increases while the precision decreases. To avoid this situation and better handle the class imbalance problem, this paper proposes a new over-sampling method, namely Subspace-based Minority Over-Sampling (abbr. SMO). This approach considers that each category of sam-ples is formed by common and unique characteristics, and such characteristics can be extracted by subspace. To obtain the balanced data, the common part is over-sampled for more accurately depicting the minority, and the unique part can be expanded by some generative methods. The balanced data are obtained by restoring the generated products of the subspace to the original space. The experimental results demonstrate that the SMO has the ability to model complex data distributions and outperforms both classical and newly designed over-sampling algorithms. Also, SMO can be used to generate simple images, and the generation results of MNIST can be clearly identified by both human vision and machine vision.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:371 / 388
页数:18
相关论文
共 50 条
  • [21] ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning
    He, Haibo
    Bai, Yang
    Garcia, Edwardo A.
    Li, Shutao
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1322 - 1328
  • [22] Imam T, 2006, LECT NOTES COMPUT SC, V4304, P264
  • [23] APPROXIMATIONS OF THE CRITICAL REGION OF THE FRIEDMAN STATISTIC
    IMAN, RL
    DAVENPORT, JM
    [J]. COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1980, 9 (06): : 571 - 595
  • [24] Survey on deep learning with class imbalance
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [25] Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data
    Khan, Salman H.
    Hayat, Munawar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    Togneri, Roberto
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3573 - 3587
  • [26] Kingma D. P, 2015, 3 INT C LEARNING REP
  • [27] Learning from imbalanced data: open challenges and future directions
    Krawczyk B.
    [J]. Krawczyk, Bartosz (bartosz.krawczyk@pwr.edu.pl), 1600, Springer Verlag (05): : 221 - 232
  • [28] MADNet: A Fast and Lightweight Network for Single-Image Super Resolution
    Lan, Rushi
    Sun, Long
    Liu, Zhenbing
    Lu, Huimin
    Pang, Cheng
    Luo, Xiaonan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (03) : 1443 - 1453
  • [29] A survey on addressing high-class imbalance in big data
    Leevy J.L.
    Khoshgoftaar T.M.
    Bauder R.A.
    Seliya N.
    [J]. Journal of Big Data, 5 (1)
  • [30] A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    Fan, Zhu
    [J]. INFORMATION SCIENCES, 2021, 565 : 438 - 455