A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引：0

作者：

Ren, Weishuo ^{[1
,2
]}

Zheng, Yifeng ^{[1
,2
]}

Zhang, Wenjie ^{[1
,2
]}

Qing, Depeng ^{[1
,2
]}

Zeng, Xianlong ^{[1
,2
]}

Li, Guohe ^{[3
]}

机构：

[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China

[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China

[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 612卷

关键词：

Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;

D O I：

10.1016/j.neucom.2024.128717

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.

引用

页数：16

共 39 条

[1] Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams
Alberghini, Gavin
Barbon, Sylvio, Jr.
Cano, Alberto
[J]. NEUROCOMPUTING, 2022, 481 : 228 - 248
[2] Learning multi-label scene classification
Boutell, MR
Luo, JB
Shen, XP
Brown, CM
[J]. PATTERN RECOGNITION, 2004, 37 (09) : 1757 - 1771
[3] REMEDIAL-HwR: Tackling multilabel imbalance through label decoupling and data resampling hybridization
Charte, Francisco
Rivera, Antonio J.
del Jesus, Maria J.
Herrera, Francisco
[J]. NEUROCOMPUTING, 2019, 326 : 110 - 122
[4] MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation
Charte, Francisco
Rivera, Antonio J.
del Jesus, Maria J.
Herrera, Francisco
[J]. KNOWLEDGE-BASED SYSTEMS, 2015, 89 : 385 - 397
[5] Addressing imbalance in multilabel classification: Measures and random resampling algorithms
Charte, Francisco
Rivera, Antonio J.
del Jesus, Maria J.
Herrera, Francisco
[J]. NEUROCOMPUTING, 2015, 163 : 3 - 16
[6] Charte F, 2014, LECT NOTES COMPUT SC, V8480, P110
[7] Charte F, 2014, LECT NOTES COMPUT SC, V8669, P1, DOI 10.1007/978-3-319-10840-7_1
[8] Charte F, 2013, LECT NOTES COMPUT SC, V8073, P150, DOI 10.1007/978-3-642-40846-5_16
[9] Chawla N. V., 2004, ACM Sigkdd Explorations Newsletter, V6, P1, DOI [10.1145/1007730.1007733la, 10.1145/1007730.1007733, DOI 10.1145/1007730.1007733]
[10] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)

← 1 2 3 4 →