A selective ensemble learning algorithm for imbalanced dataset

被引:2
|
作者
Hongle, Du [1 ,2 ]
Yan, Zhang [1 ,2 ]
Gang, Ke [3 ]
机构
[1] Shangluo Univ, Sch Math & Comp Applicat, Shangluo, Peoples R China
[2] Shangluo Publ Big Data Res Ctr, Shangluo, Peoples R China
[3] Dongguan Polytech, Dongguan, Peoples R China
关键词
Imbalanced data; Under sampling; Selective ensemble learning; Network intrusion detection;
D O I
10.1007/s12652-021-03453-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Network intrusion behaviour data is the imbalanced data. It includes a large amount of normal behavior data and a small amount of intrusion behavior data. The traditional selective ensemble learning algorithm will lead to high false negative rate. This paper proposes a selective ensemble learning algorithm for imbalanced data based on under sampling (SELAUS). First of all, the algorithm uses Bootstrap method to extract samples equal to the number of samples of a few classes from majority class samples to construct multiple balanced training subsets. Then, in order to ensure that the obtained base classifiers have great differences, several features are randomly selected on the training subset and a decision tree is constructed as the base classifier using CART algorithm. This method can also make some base classifiers have poor performance, so it can select and integrate base classifiers instead of all base classifiers. In order to accurately evaluate the generalization error of the classifier for imbalanced dataset, this paper defines the performance evaluation method in the imbalanced dataset and the difference evaluation method between the base classifiers. Then the generalization error of each base classifier is calculated, and the base classifier is selected according to the generalization error. In the integration of weighted voting, the weight of each base classifier is calculated by the weight calculation method for imbalanced data. Finally, the validity of the algorithm is verified by UCI dataset and applied to network intrusion detection. The simulation results show that the algorithm can improve the detection rate of minority class samples, that is to say, reduce the false negative rate.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Selective Ensemble Learning Algorithm for Imbalanced Dataset
    Du, Hongle
    Zhang, Yan
    Zhang, Lin
    Chen, Yeh-Cheng
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 831 - 856
  • [2] Dynamic weighted selective ensemble learning algorithm for imbalanced data streams
    Zhang Yan
    Du Hongle
    Ke Gang
    Zhang Lin
    Yeh-Cheng Chen
    The Journal of Supercomputing, 2022, 78 : 5394 - 5419
  • [3] Dynamic weighted selective ensemble learning algorithm for imbalanced data streams
    Yan, Zhang
    Du Hongle
    Gang, Ke
    Lin, Zhang
    Chen, Yeh-Cheng
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5394 - 5419
  • [4] Imbalanced Ensemble Classifier for Learning from Imbalanced Business School Dataset
    Chakraborty, Tanujit
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2019, 4 (04) : 861 - 869
  • [5] A Diverse Meta Learning Ensemble Technique to Handle Imbalanced Microarray Dataset
    Dash, Sujata
    ADVANCES IN NATURE AND BIOLOGICALLY INSPIRED COMPUTING, 2016, 419 : 1 - 13
  • [6] Online ensemble learning algorithm for imbalanced data stream
    Hongle, Du
    Yan, Zhang
    Gang, Ke
    Lin, Zhang
    Chen, Yeh-Cheng
    APPLIED SOFT COMPUTING, 2021, 107
  • [7] A selective evolutionary heterogeneous ensemble algorithm for classifying imbalanced data
    An, Xiaomeng
    Xu, Sen
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (05): : 2733 - 2757
  • [8] Selective ensemble algorithm for imbalanced underwater acoustic target data
    Cheng Y.
    Zhang Z.
    Li H.
    Liu Z.
    Zhang, Zongtang (qtxy_robin@126.com), 1600, Editorial Board of Journal of Harbin Engineering (41): : 1553 - 1558
  • [9] A Classification Algorithm Based on Ensemble Feature Selections for Imbalanced-Class Dataset
    Yin, Hua
    Gai, Keke
    Wang, Zhijian
    2016 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY), IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC), AND IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2016, : 245 - 249
  • [10] A hybrid data-level ensemble to enable learning from highly imbalanced dataset
    Chen, Zhi
    Duan, Jiang
    Kang, Li
    Qiu, Guoping
    INFORMATION SCIENCES, 2021, 554 : 157 - 176