A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引:0
作者
Ren, Weishuo [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Qing, Depeng [1 ,2 ]
Zeng, Xianlong [1 ,2 ]
Li, Guohe [3 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China
[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China
关键词
Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;
D O I
10.1016/j.neucom.2024.128717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.
引用
收藏
页数:16
相关论文
共 39 条
  • [1] Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams
    Alberghini, Gavin
    Barbon, Sylvio, Jr.
    Cano, Alberto
    [J]. NEUROCOMPUTING, 2022, 481 : 228 - 248
  • [2] Learning multi-label scene classification
    Boutell, MR
    Luo, JB
    Shen, XP
    Brown, CM
    [J]. PATTERN RECOGNITION, 2004, 37 (09) : 1757 - 1771
  • [3] REMEDIAL-HwR: Tackling multilabel imbalance through label decoupling and data resampling hybridization
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2019, 326 : 110 - 122
  • [4] MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 89 : 385 - 397
  • [5] Addressing imbalance in multilabel classification: Measures and random resampling algorithms
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2015, 163 : 3 - 16
  • [6] Charte F, 2014, LECT NOTES COMPUT SC, V8480, P110
  • [7] Charte F, 2014, LECT NOTES COMPUT SC, V8669, P1, DOI 10.1007/978-3-319-10840-7_1
  • [8] Charte F, 2013, LECT NOTES COMPUT SC, V8073, P150, DOI 10.1007/978-3-642-40846-5_16
  • [9] Chawla N. V., 2004, ACM Sigkdd Explorations Newsletter, V6, P1, DOI [10.1145/1007730.1007733la, 10.1145/1007730.1007733, DOI 10.1145/1007730.1007733]
  • [10] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)