HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification

被引:34
|
作者
Hasib, Khan Md [1 ]
Towhid, Nurul Akter [2 ]
Islam, Md Rafiqul [3 ]
机构
[1] Ahsanullah Univ Sci & Engn, Dhaka, Bangladesh
[2] Jahangirnagar Univ, Dhaka, Bangladesh
[3] Univ Technol Sydney UTS, Sydney, NSW, Australia
关键词
Class Imbalance; Classification; Deep Learning; ENN; LSTM; Sampling; SMOTE; SUPPORT; SMOTE;
D O I
10.4018/IJCAC.2021100101
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Imbalanced data presents many difficulties, as the majority of learners will be prejudice against the majority class, and in severe cases, may fully disregard the minority class. Over the last few decades, class inequality has been extensively researched using traditional machine learning techniques. However, there is relatively little analytical research in the field of deep learning with class inequality. In this article, the authors classify the imbalanced data with the combination of both sampling method and deep learning method. They propose a novel sampling-based deep learning method (HSDLM) to address the class imbalance problem. They preprocess the data with label encoding and remove the noisy data with the under-sampling technique edited nearest neighbor (ENN) algorithm. They also balance the data using the over-sampling technique SMOTE and apply parallelly three types of long short-term memory networks, which is a deep learning classifier. The experimental findings indicate that HSDLM is a promising and fruitful solution to working with strongly imbalanced datasets.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [41] Robust hybrid data-level sampling approach to handle imbalanced data during classification
    Prabhjot Kaur
    Anjana Gosain
    Soft Computing, 2020, 24 : 15715 - 15732
  • [42] A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification
    Chen, Gang
    Hou, Binjie
    Lei, Tiangang
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (10) : 17866 - 17885
  • [43] Imbalanced Toxic Comments Classification using Data Augmentation and Deep Learning
    Ibrahim, Mai
    Torki, Marwan
    El-Makky, Nagwa
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 875 - 878
  • [44] A New Hybrid Sampling Approach for Classification of Imbalanced Datasets
    Hanskunatai, Anantaporn
    PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS), 2018, : 67 - 71
  • [45] A Deep Learning Based Printing Defect Classification Method with Imbalanced Samples
    Zhang, Erhu
    Li, Bo
    Li, Peilin
    Chen, Yajun
    SYMMETRY-BASEL, 2019, 11 (12):
  • [46] A Hybrid Active Sampling Algorithm for Imbalanced Learning
    Gu, Ping
    Lu, Yong
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 600 - 605
  • [47] HSNF: Hybrid sampling with two-step noise filtering for imbalanced data classification
    Duan, Lilong
    Xue, Wei
    Gu, Xiaolei
    Luo, Xiao
    He, Yongsheng
    INTELLIGENT DATA ANALYSIS, 2023, 27 (06) : 1573 - 1593
  • [48] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [49] Improving Deep Learning Performance Using Sampling Techniques for IoT Imbalanced Data
    El Hariri, Ayyoub
    Mouiti, Mohamed
    Habibi, Omar
    Lazaar, Mohamed
    18TH INTERNATIONAL CONFERENCE ON FUTURE NETWORKS AND COMMUNICATIONS, FNC 2023/20TH INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS AND PERVASIVE COMPUTING, MOBISPC 2023/13TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY, SEIT 2023, 2023, 224 : 180 - 187
  • [50] Deep Spatio-Temporal Representation Learning for Multi-Class Imbalanced Data Classification
    Pouyanfar, Samira
    Chen, Shu-Ching
    Shyu, Mei-Ling
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 386 - 393