Hellinger Distance Based Oversampling Method to Solve Multi-class Imbalance Problem

被引:0
作者
Kumari, Amisha [1 ]
Thakar, Urjita [1 ]
机构
[1] Shri GS Inst Tech & Sci, Dept Comp Engn, Indore, Madhya Pradesh, India
来源
2017 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT) | 2017年
关键词
Classification; Multi-class; Class imbalance problem; Hellinger distance; Oversampling;
D O I
10.1109/CSNT.2017.26
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Classification is a popular technique used to predict group membership for data samples in datasets. A multi-class or multinomial classification is the problem of classifying instances into more than two classes. With the emerging technology, the complexity of multi-class data has also increased thereby leading to class imbalance problem. With an imbalanced dataset, a machine learning algorithm can not make an accurate prediction. Therefore, in this paper Hellinger distance based oversampling method has been proposed. It is useful in balancing the datasets so that minority class can be identified with high accuracy without affecting accuracy of majority class. New synthetic data is generated using this method to achieve balance ratio. Testing has been done on five benchmark datasets using two standard classifiers KNN and C4.5. The evaluation matrix on precision, recall and fmeasure are drawn for two standard classification algorithms. It is observed that Hellinger distance reduces risk of overlapping and skewness of data. Obtained results show increase of 20% in classification accuracy compared to classification of imbalance multi-class dataset.
引用
收藏
页码:137 / 141
页数:5
相关论文
共 15 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]  
Allwein E., 2002, JMLR, V1, P113
[3]  
[Anonymous], 2012, DATA PREPROCESSING
[4]  
Asen Md M. Islam, 2015, IEEE T CYBERNETICS
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]   A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04) :463-484
[7]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[8]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[9]  
Hoens T., 2012, LNAI, Vxxxx
[10]  
japkowicz N., 2004, INT C MACH LEARN, P816