Instance-based entropy fuzzy support vector machine for imbalanced data

被引:9
作者
Cho, Poongjin [1 ]
Lee, Minhyuk [2 ]
Chang, Woojin [1 ]
机构
[1] Seoul Natl Univ, Dept Ind Engn, 1 Gwanak Ro, Seoul 151742, South Korea
[2] Samsung Elect, Mobile Commun Business, Big Data Analyt Grp, Suwon, South Korea
关键词
Fuzzy support vector machine; Imbalanced dataset; Entropy; Pattern recognition; Nearest neighbor; NEAREST-NEIGHBOR; LEARNING-MACHINE; DATA-SETS; CLASSIFICATION; SELECTION; ALGORITHMS; REDUCTION;
D O I
10.1007/s10044-019-00851-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced classification has been a major challenge for machine learning because many standard classifiers mainly focus on balanced datasets and tend to have biased results toward the majority class. We modify entropy fuzzy support vector machine (EFSVM) and introduce instance-based entropy fuzzy support vector machine (IEFSVM). Both EFSVM and IEFSVM use the entropy information of k-nearest neighbors to determine the fuzzy membership value for each sample which prioritizes the importance of each sample. IEFSVM considers the diversity of entropy patterns for each sample when increasing the size of neighbors, k, while EFSVM uses single entropy information of the fixed size of neighbors for all samples. By varying k, we can reflect the component change of sample's neighbors from near to far distance in the determination of fuzzy value membership. Numerical experiments on 35 public and 12 real-world imbalanced datasets are performed to validate IEFSVM, and area under the receiver operating characteristic curve (AUC) is used to compare its performance with other SVMs and machine learning methods. IEFSVM shows a much higher AUC value for datasets with high imbalance ratio, implying that IEFSVM is effective in dealing with the class imbalance problem.
引用
收藏
页码:1183 / 1202
页数:20
相关论文
共 86 条
[1]   The joint use of sequence features combination and modified weighted SVM for improving daily activity recognition [J].
Abidine, Bilal M'hamed ;
Fergani, Lamya ;
Fergani, Belkacem ;
Oussalah, Mourad .
PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (01) :119-138
[3]  
[Anonymous], 2018, arXiv
[4]  
[Anonymous], DAILY WEATHER OBSERV
[5]  
[Anonymous], 2018, DAILY AIR POLLUTION
[6]   Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction [J].
Artetxe, Arkaitz ;
Grana, Manuel ;
Beristain, Andoni ;
Rios, Sebastian .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10) :5735-5744
[7]   FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning [J].
Batuwita, Rukshan ;
Palade, Vasile .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :558-571
[8]   Classifying imbalanced data sets using similarity based hierarchical decomposition [J].
Beyan, Cigdem ;
Fisher, Robert .
PATTERN RECOGNITION, 2015, 48 (05) :1653-1672
[9]   Decision tree induction based on minority entropy for the class imbalance problem [J].
Boonchuay, Kesinee ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (03) :769-782
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32