Imbalanced Data Over-Sampling Method Based on ISODATA Clustering

被引:1
作者
Lv, Zhenzhe [1 ]
Liu, Qicheng [1 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 264000, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
imbalanced data; clustering; oversampling; ISODATA; SMOTE;
D O I
10.1587/transinf.2022EDP7190
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISO DATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy. counts the of to cancer tient sifies and
引用
收藏
页码:1528 / 1536
页数:9
相关论文
共 33 条
[1]  
Arai K, 2022, INT J ADV COMPUT SC, V13, P187
[2]   Multiclass Probabilistic Classification for Support Vector Machines [J].
Bae, Ji-Sang ;
Kim, Jong-Ok .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (06) :1251-1255
[3]   Dyadic imbalance in networks [J].
Burghardt, Keith ;
Maoz, Zeev .
JOURNAL OF COMPLEX NETWORKS, 2020, 8 (01)
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise [J].
Chen, Baiyun ;
Xia, Shuyin ;
Chen, Zizhong ;
Wang, Binggui ;
Wang, Guoyin .
INFORMATION SCIENCES, 2021, 553 :397-428
[6]   RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification [J].
Ding, Hongwei ;
Sun, Yu ;
Wang, Zhenyu ;
Huang, Nana ;
Shen, Zhidong ;
Cui, Xiaohui .
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
[7]   Credit Card Fraud Detection using Machine Learning Algorithms [J].
Dornadula, Vaishnavi Nath ;
Geetha, S. .
2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 :631-641
[8]   An Over Sampling Method of Unbalanced Data Based on Ant Colony Clustering [J].
Gao Yang ;
Liu Qicheng .
IEEE ACCESS, 2021, 9 :130990-130996
[9]   Learning from class-imbalanced data: Review of methods and applications [J].
Guo Haixiang ;
Li Yijing ;
Shang, Jennifer ;
Gu Mingyun ;
Huang Yuanyue ;
Bing, Gong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 :220-239
[10]   Neural network for ordinal classification of imbalanced data by minimizing a Bayesian cost [J].
Lazaro, Marcelino ;
Figueiras-Vidal, Anibal R. .
PATTERN RECOGNITION, 2023, 137