A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引:0
作者
Ren, Weishuo [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Qing, Depeng [1 ,2 ]
Zeng, Xianlong [1 ,2 ]
Li, Guohe [3 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China
[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China
关键词
Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;
D O I
10.1016/j.neucom.2024.128717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.
引用
收藏
页数:16
相关论文
共 39 条
  • [21] A Novel Classification Method Based on a Two-Phase Technique for Learning Imbalanced Text Data
    Li, Der-Chiang
    Chen, Szu-Chou
    Lin, Yao-San
    Hsu, Wen-Yen
    [J]. SYMMETRY-BASEL, 2022, 14 (03):
  • [22] Synthetic Oversampling of Multi-label Data Based on Local Label Distribution
    Liu, Bin
    Tsoumakas, Grigorios
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 180 - 193
  • [23] Multi-label text classification via joint learning from label embedding and label correlation
    Liu, Huiting
    Chen, Geng
    Li, Peipei
    Zhao, Peng
    Wu, Xindong
    [J]. NEUROCOMPUTING, 2021, 460 : 385 - 398
  • [24] Review of ensembles of multi-label classifiers: Models, experimental study and prospects
    Moyano, Jose M.
    Gibaja, Eva L.
    Cios, Krzysztof J.
    Ventura, Sebastian
    [J]. INFORMATION FUSION, 2018, 44 : 33 - 45
  • [25] Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending
    Niu, Kun
    Zhang, Zaimei
    Liu, Yan
    Li, Renfa
    [J]. INFORMATION SCIENCES, 2020, 536 : 120 - 134
  • [26] Sentiment Analysis of Customers' Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution
    Obiedat, Ruba
    Qaddoura, Raneem
    Al-Zoubi, Ala' M.
    Al-Qaisi, Laila
    Harfoushi, Osama
    Alrefai, Mo'ath
    Faris, Hossam
    [J]. IEEE ACCESS, 2022, 10 : 22260 - 22273
  • [27] MLTL: A multi-label approach for the Tomek Link undersampling algorithm
    Pereira, Rodolfo M.
    Costa, Yandre M. G.
    Silla Jr, Carlos N.
    [J]. NEUROCOMPUTING, 2020, 383 : 95 - 105
  • [28] Classifier chains for multi-label classification
    Read, Jesse
    Pfahringer, Bernhard
    Holmes, Geoff
    Frank, Eibe
    [J]. MACHINE LEARNING, 2011, 85 (03) : 333 - 359
  • [29] Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets
    Sadhukhan, Payel
    Palit, Sarbani
    [J]. PATTERN RECOGNITION LETTERS, 2019, 125 : 813 - 820
  • [30] Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches
    Tarekegn, Adane
    Ricceri, Fulvio
    Costa, Giuseppe
    Ferracin, Elisa
    Giacobini, Mario
    [J]. JMIR MEDICAL INFORMATICS, 2020, 8 (06)