A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems

被引:6
作者
Feizi, Tayyebe [1 ]
Moattar, Mohammad Hossein [1 ]
Tabatabaee, Hamid [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
关键词
Imbalanced data; Classification; Under-sampling; Multi-Manifold learning; REDUCTION ALGORITHM; SMOTE;
D O I
10.1186/s40537-023-00832-2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Under-sampling is a technique to overcome imbalanced class problem, however, selecting the instances to be dropped and measuring their informativeness is an important concern. This paper tries to bring up a new point of view in this regard and exploit the structure of data to decide on the importance of the data points. For this purpose, a multi-manifold learning approach is proposed. Manifolds represent the underlying structures of data and can help extract the latent space for data distribution. However, there is no evidence that we can rely on a single manifold to extract the local neighborhood of the dataset. Therefore, this paper proposes an ensemble of manifold learning approaches and evaluates each manifold based on an information loss-based heuristic. Having computed the optimality score of each manifold, the centrality and marginality degrees of samples are computed on the manifolds and weighted by the corresponding score. A gradual elimination approach is proposed, which tries to balance the classes while avoiding a drop in the F measure on the validation dataset. The proposed method is evaluated on 22 imbalanced datasets from the KEEL and UCI repositories with different classification measures. The results of the experiments demonstrate that the proposed approach is more effective than other similar approaches and is far better than the previous approaches, especially when the imbalance ratio is very high.
引用
收藏
页数:36
相关论文
共 53 条
[41]   2 MODIFICATIONS OF CNN [J].
TOMEK, I .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1976, 6 (11) :769-772
[42]  
TOMEK I, 1976, IEEE T SYST MAN CYB, V6, P448
[43]   Neighbourhood-based undersampling approach for handling imbalanced and overlapped data [J].
Vuttipittayamongkol, Pattaramon ;
Elyan, Eyad .
INFORMATION SCIENCES, 2020, 509 :47-70
[44]   Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models [J].
Wang, Shuo ;
Yao, Xin .
2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, :324-331
[45]   A novel progressively undersampling method based on the density peaks sequence for imbalanced data [J].
Xie, Xiaoying ;
Liu, Huawen ;
Zeng, Shouzhen ;
Lin, Lingbin ;
Li, Wen .
KNOWLEDGE-BASED SYSTEMS, 2021, 213
[46]   A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data [J].
Xu, Zhaozhao ;
Shen, Derong ;
Nie, Tiezheng ;
Kou, Yue ;
Yin, Nan ;
Han, Xi .
INFORMATION SCIENCES, 2021, 572 :574-589
[47]   A lightweight weakly supervised learning segmentation algorithm for imbalanced image based on rotation density peaks [J].
Yan, Ming ;
Chen, Yewang ;
Chen, Yi ;
Zeng, Guoyao ;
Hu, Xiaoliang ;
Du, Jixiang .
KNOWLEDGE-BASED SYSTEMS, 2022, 244
[48]   Natural neighborhood graph-based instance reduction algorithm without parameters [J].
Yang, Lijun ;
Zhu, Qingsheng ;
Huang, Jinlong ;
Cheng, Dongdong ;
Wu, Quanwang ;
Hong, Xiaolu .
APPLIED SOFT COMPUTING, 2018, 70 :279-287
[49]   Cluster-based under-sampling approaches for imbalanced data distributions [J].
Yen, Show-Jane ;
Lee, Yue-Shi .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5718-5727
[50]   Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation [J].
Yeung, Michael ;
Sala, Evis ;
Schoenlieb, Carola-Bibiane ;
Rundo, Leonardo .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2022, 95