Handling Imbalanced Dataset Using SVM and k-NN Approach

被引:9
|
作者
Wah, Yap Bee [1 ]
Abd Rahman, Hezlin Aryani [1 ]
He, Haibo [2 ,3 ]
Bulgiba, Awang [4 ]
机构
[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia
[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia
来源
ADVANCES IN INDUSTRIAL AND APPLIED MATHEMATICS | 2016年 / 1750卷
关键词
data mining; classification; imbalanced data; SVM; k-NN;
D O I
10.1063/1.4954536
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A Memory Based Approach to Word Sense Disambiguation in Bengali Using k-NN Method
    Pandit, Rajat
    Naskar, Sudip Kumar
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 383 - 386
  • [22] Automatic fibrosis quantification by using a k-NN classificator
    Romero, E
    Raymackers, JM
    Macq, B
    Cuisenaire, O
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4: BUILDING NEW BRIDGES AT THE FRONTIERS OF ENGINEERING AND MEDICINE, 2001, 23 : 2609 - 2612
  • [23] The Grading of Agarwood Oil Quality using k-Nearest Neighbor (k-NN)
    Ismail, Nurlaila
    Rahiman, Mohd Hezri Fazalul
    Taib, Mohd Nasir
    Ali, Nor Azah Mohd
    Jamil, Mailina
    Tajuddin, Saiful Nizam
    2013 IEEE CONFERENCE ON SYSTEMS, PROCESS & CONTROL (ICSPC), 2013, : 1 - 5
  • [24] Robust gravitation based adaptive k-NN graph under class-imbalanced scenarios
    Yan, Yuanting
    Zhou, Tianxiao
    Zheng, Zhong
    Ge, Hao
    Zhang, Yiwen
    Zhang, Yanping
    KNOWLEDGE-BASED SYSTEMS, 2022, 239
  • [25] Duplicate image detection using deep learning modified SVM and k-NN classification method for multimedia application
    Singh M.K.
    Kumar S.
    Ranjan R.
    Nandan D.
    Soft Computing, 2024, 28 (13-14) : 7659 - 7670
  • [26] Writer identification system for pre-segmented offline handwritten Devanagari characters using k-NN and SVM
    Shaveta Dargan
    Munish Kumar
    Anupam Garg
    Kutub Thakur
    Soft Computing, 2020, 24 : 10111 - 10122
  • [27] Fault detection and classification in smart grids using augmented K-NN algorithm
    Javad Hosseinzadeh
    Farokh Masoodzadeh
    Emad Roshandel
    SN Applied Sciences, 2019, 1
  • [28] Classification of Pistachio Species Using Improved k-NN Classifier
    Ozkan, Ilker Ali
    Koklu, Murat
    Saracoglu, Ridvan
    PROGRESS IN NUTRITION, 2021, 23 (02):
  • [29] Identification and Grading of Spasticity By Using AdaBoost and k-NN Techniques
    Albayrak, Yalcin
    Cetinel, Gokcen
    Gul, Sevda
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [30] OKC classifier: an efficient approach for classification of imbalanced dataset using hybrid methodology
    Ashok Kumar Bathla
    Shally Bansal
    Munish Kumar
    Soft Computing, 2022, 26 : 11497 - 11503