Handling Imbalanced Dataset Using SVM and k-NN Approach

被引:9
|
作者
Wah, Yap Bee [1 ]
Abd Rahman, Hezlin Aryani [1 ]
He, Haibo [2 ,3 ]
Bulgiba, Awang [4 ]
机构
[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia
[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia
来源
ADVANCES IN INDUSTRIAL AND APPLIED MATHEMATICS | 2016年 / 1750卷
关键词
data mining; classification; imbalanced data; SVM; k-NN;
D O I
10.1063/1.4954536
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Hybrid Indoor Position Estimation using K-NN and MinMax
    Subhan, Fazli
    Ahmed, Shakeel
    Haider, Sajjad
    Saleem, Sajid
    Khan, Asfandyar
    Ahmed, Salman
    Numan, Muhammad
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (09): : 4408 - 4428
  • [32] Classification in medical images using adaptive metric k-NN
    Chen, C.
    Chernoff, K.
    Karemore, G.
    Lo, P.
    Nielsen, M.
    Lauze, F.
    MEDICAL IMAGING 2010: IMAGE PROCESSING, 2010, 7623
  • [33] OKC classifier: an efficient approach for classification of imbalanced dataset using hybrid methodology
    Bathla, Ashok Kumar
    Bansal, Shally
    Kumar, Munish
    SOFT COMPUTING, 2022, 26 (21) : 11497 - 11503
  • [34] A MapReduce Based k-NN Joins Probabilistic Classifier
    Chatzigeorgakidis, Georgios
    Karagiorgou, Sophia
    Athanasiou, Spiros
    Skiadopoulos, Spiros
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 952 - 957
  • [35] Stevedoring Time Estimation on Smart Port Services Using K-NN Algorithm
    Ramadhani, Dimas Khrisna
    Novian, Fahmi
    Puspitorini, Okkie
    Siswandari, Nur Adi
    Mahmudah, Haniah
    Wijayanti, Ari
    2020 6TH INTERNATIONAL CONFERENCE ON SCIENCE IN INFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0: TOWARDS INNOVATION IN DISASTER MANAGEMENT, 2020, : 115 - 120
  • [36] A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals
    Palaniappan, Rajkumar
    Sundaraj, Kenneth
    Sundaraj, Sebastian
    BMC BIOINFORMATICS, 2014, 15
  • [37] Resolving the Celestial Classification using Fine k-NN Classifier
    Yadav, Sangeeta
    Kaur, Amandeep
    Bhauryal, Neeraj Singh
    2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 714 - 719
  • [38] Exploring the Feature Selection of the EEG Signal Time and Frequency Domain Features for k-NN and Weighted k-NN
    Diah, Theresia K.
    Faqih, Akhmad
    Kusumoputro, Benyamin
    PROCEEDINGS OF 2019 IEEE R10 HUMANITARIAN TECHNOLOGY CONFERENCE (IEEE R10 HTC 2019), 2019, : 196 - 199
  • [39] Detecting Online Game Malicious Chargeback by using k-NN
    Wei, Yu-Chih
    Lai, You-Xin
    Su, Hai-Po
    Yen, Yu-Wen
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1971 - 1976
  • [40] Classifying Motion Intention from EMG signal: A k-NN Approach
    Khairuddin, Ismail Mohd
    Sidek, Shahrul Na'im
    Majeed, Anwar P. P. Abdul
    Puzi, Asmarani Ahmad
    2019 7TH INTERNATIONAL CONFERENCE ON MECHATRONICS ENGINEERING (ICOM), 2019, : 124 - 127