A design of information granule-based under-sampling method in imbalanced data classification

被引：0

作者：

Tianyu Liu

Xiubin Zhu

Witold Pedrycz

Zhiwu Li

机构：

[1] Xidian University,School of Electro

[2] University of Alberta,Mechanical Engineering

[3] Macau University of Science and Technology,Department of Electrical and Computer Engineering

[4] King Abdulaziz University,Institute of Systems Engineering

[5] Guilin University of Electronic Technology,Faculty of Engineering

来源：

Soft Computing | 2020年 / 24卷

关键词：

Imbalanced data; Information granule; Support vector machine (SVM); -nearest-neighbor (KNN); Under-sampling;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In numerous real-world problems, we are faced with difficulties in learning from imbalanced data. The classification performance of a “standard” classifier (learning algorithm) is evidently hindered by the imbalanced distribution of data. The over-sampling and under-sampling methods have been researched extensively with the aim to increase the predication accuracy over the minority class. However, traditional under-sampling methods tend to ignore important characteristics pertinent to the majority class. In this paper, a novel under-sampling method based on information granules is proposed. The method exploits the concepts and algorithms of granular computing. First, information granules are built around the selected patterns coming from the majority class to capture the essence of the data belonging to this class. In the sequel, the resultant information granules are evaluated in terms of their quality and those with the highest specificity values are selected. Next, the selected numeric data are augmented by some weights implied by the size of information granules. Finally, a support vector machine and a K-nearest-neighbor classifier, both being regarded here as representative classifiers, are built based on the weighted data. Experimental studies are carried out using synthetic data as well as a suite of imbalanced data sets coming from the public machine learning repositories. The experimental results quantify the performance of support vector machine and K-nearest-neighbor with under-sampling method based on information granules. The results demonstrate the superiority of the performance obtained for these classifiers endowed with conventional under-sampling method. In general, the improvement of performance expressed in terms of G-means is over 10% when applying information granule under-sampling compared with random under-sampling.

引用

页码：17333 / 17347

页数：14

共 50 条

[21] Framework for the Classification of Imbalanced Structured Data Using Under-sampling and Convolutional Neural Network
Yoon Sang Lee
Chulhwan Chris Bang
Information Systems Frontiers, 2022, 24 : 1795 - 1809
[22] A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
Tayyebe Feizi
Mohammad Hossein Moattar
Hamid Tabatabaee
Journal of Big Data, 10
[23] A novel two-phase clustering-based under-sampling method for imbalanced classification problems
Farshidvard, A.
Hooshmand, F.
MirHassani, S. A.
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[24] A New Hybrid Under-sampling Approach to Imbalanced Classification Problems
Peng, Chun-Yang
Park, You-Jin
APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
[25] A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems
Feizi, Tayyebe
Moattar, Mohammad Hossein
Tabatabaee, Hamid
JOURNAL OF BIG DATA, 2023, 10 (01)
[26] Two-step ensemble under-sampling algorithm for massive imbalanced data classification
Bai, Lin
Ju, Tong
Wang, Hao
Lei, Mingzhu
Pan, Xiaoying
INFORMATION SCIENCES, 2024, 665
[27] Framework for the Classification of Imbalanced Structured Data Using Under-sampling and Convolutional Neural Network
Lee, Yoon Sang
Bang, Chulhwan Chris
INFORMATION SYSTEMS FRONTIERS, 2022, 24 (06) : 1795 - 1809
[28] Rule-based granular classification: A hypersphere information granule-based method
Fu, Chen
Lu, Wei
Pedrycz, Witold
Yang, Jianhua
KNOWLEDGE-BASED SYSTEMS, 2020, 194
[29] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
Guzman-Ponce, A.
Valdovinos, R. M.
Sanchez, J. S.
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
[30] Automatic incident detection algorithm based on under-sampling for imbalanced traffic data
Li, Miao-hua
Chen, Shu-yan
Lao, Ye-chun
GREEN BUILDING, ENVIRONMENT, ENERGY AND CIVIL ENGINEERING, 2017, : 145 - 150

← 1 2 3 4 5 →