Handling Imbalanced Dataset Using SVM and k-NN Approach

被引:9
|
作者
Wah, Yap Bee [1 ]
Abd Rahman, Hezlin Aryani [1 ]
He, Haibo [2 ,3 ]
Bulgiba, Awang [4 ]
机构
[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia
[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia
来源
ADVANCES IN INDUSTRIAL AND APPLIED MATHEMATICS | 2016年 / 1750卷
关键词
data mining; classification; imbalanced data; SVM; k-NN;
D O I
10.1063/1.4954536
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Clustering and Principal Feature Selection Impact for Internet Traffic Classification Using K-NN
    Wiradinata, Trianggoro
    Suryaputra, P. Adi
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL SYSTEMS, TECHNOLOGY AND INFORMATION 2015 (ICESTI 2015), 2016, 365 : 75 - 81
  • [42] GAIT Analysis for Identification by Using SVM with K-IN and NN Techniques
    Bajwa, Taranjot Kaur
    Garg, Sourav
    Saurabh, Kumar
    2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 259 - 263
  • [43] EEG-Based Human Emotion Recognition Using k-NN Machine Learning
    Yusuf, A. A.
    Wijaya, S. K.
    Prajitno, P.
    PROCEEDINGS OF THE 4TH INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES (ISCPMS2018), 2019, 2168
  • [44] A Framework for Improvement a Decision Tree Learning Algorithm Using K-NN
    Kurematsu, Masaki
    Hakura, Jun
    Fujita, Hamido
    NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2014, 265 : 206 - 212
  • [45] Neuromuscular Disease Diagnosis of SVM, K-NN and DA Algorithm Based Classification Part-II
    Kucuk, Hanife
    Eminoglu, Ilyas
    2016 MEDICAL TECHNOLOGIES NATIONAL CONFERENCE (TIPTEKNO), 2015,
  • [46] An Improved K-NN Algorithm Through Class Discernibility and Cohesiveness
    Sarkar, Rajesh Prasad
    Maiti, Ananjan
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 445 - 454
  • [47] Shape and textural based image retrieval using K-NN classifier
    Pande, Sandeep Dwarkanath
    Rathod, Suresh Baliram
    Chetty, Manna Sheela Rani
    Pathak, Shantanu
    Jadhav, Pramod Pandurang
    Godse, Sachin P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (04) : 4757 - 4768
  • [48] Fuzzy k-NN applied to moulds detection
    Kuske, M
    Rubio, R
    Romain, AC
    Nicolas, J
    Marco, S
    SENSORS AND ACTUATORS B-CHEMICAL, 2005, 106 (01) : 52 - 60
  • [49] Half-Against-Half Structure with SVM and k-NN Classifiers in Benthic Macroinvertebrate Image Classification
    Joutsijoki, Henry
    JOURNAL OF COMPUTERS, 2014, 9 (02) : 454 - 462
  • [50] An efficient Parallel solution for the CVRP using k-NN, Matlab, and CUDA
    Hernandez-Aguilar, Jose Alberto
    Salinas-Carrasco, Emanuel
    Zavala-Diaz, Crispin
    Gallegos, Julio Cesar Ponce-
    INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2024, 15 (02): : 147 - 159