A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引：0

作者：

Ren, Weishuo ^{[1
,2
]}

Zheng, Yifeng ^{[1
,2
]}

Zhang, Wenjie ^{[1
,2
]}

Qing, Depeng ^{[1
,2
]}

Zeng, Xianlong ^{[1
,2
]}

Li, Guohe ^{[3
]}

机构：

[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China

[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China

[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 612卷

关键词：

Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;

D O I：

10.1016/j.neucom.2024.128717

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.

引用

页数：16

共 39 条

[21] A Novel Classification Method Based on a Two-Phase Technique for Learning Imbalanced Text Data
Li, Der-Chiang
Chen, Szu-Chou
Lin, Yao-San
Hsu, Wen-Yen
[J]. SYMMETRY-BASEL, 2022, 14 (03):
[22] Synthetic Oversampling of Multi-label Data Based on Local Label Distribution
Liu, Bin
Tsoumakas, Grigorios
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 180 - 193
[23] Multi-label text classification via joint learning from label embedding and label correlation
Liu, Huiting
Chen, Geng
Li, Peipei
Zhao, Peng
Wu, Xindong
[J]. NEUROCOMPUTING, 2021, 460 : 385 - 398
[24] Review of ensembles of multi-label classifiers: Models, experimental study and prospects
Moyano, Jose M.
Gibaja, Eva L.
Cios, Krzysztof J.
Ventura, Sebastian
[J]. INFORMATION FUSION, 2018, 44 : 33 - 45
[25] Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending
Niu, Kun
Zhang, Zaimei
Liu, Yan
Li, Renfa
[J]. INFORMATION SCIENCES, 2020, 536 : 120 - 134
[26] Sentiment Analysis of Customers' Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution
Obiedat, Ruba
Qaddoura, Raneem
Al-Zoubi, Ala' M.
Al-Qaisi, Laila
Harfoushi, Osama
Alrefai, Mo'ath
Faris, Hossam
[J]. IEEE ACCESS, 2022, 10 : 22260 - 22273
[27] MLTL: A multi-label approach for the Tomek Link undersampling algorithm
Pereira, Rodolfo M.
Costa, Yandre M. G.
Silla Jr, Carlos N.
[J]. NEUROCOMPUTING, 2020, 383 : 95 - 105
[28] Classifier chains for multi-label classification
Read, Jesse
Pfahringer, Bernhard
Holmes, Geoff
Frank, Eibe
[J]. MACHINE LEARNING, 2011, 85 (03) : 333 - 359
[29] Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets
Sadhukhan, Payel
Palit, Sarbani
[J]. PATTERN RECOGNITION LETTERS, 2019, 125 : 813 - 820
[30] Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches
Tarekegn, Adane
Ricceri, Fulvio
Costa, Giuseppe
Ferracin, Elisa
Giacobini, Mario
[J]. JMIR MEDICAL INFORMATICS, 2020, 8 (06)

← 1 2 3 4 →