Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection

被引:127
作者
Ding, Hongwei [1 ,2 ]
Chen, Leiyang [1 ]
Dong, Liang [1 ]
Fu, Zhongwang [1 ]
Cui, Xiaohui [1 ,2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Wuhan Univ, Key Lab Aerosp Informat Secur & Trusted Comp, Minist Educ, Wuhan, Peoples R China
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 131卷
基金
国家重点研发计划;
关键词
Intrusion detection; Class imbalance; K-nearest neighbor; Generative adversarial network; TACGAN; LEARNING APPROACH; IDS;
D O I
10.1016/j.future.2022.01.026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the continuous emergence of various network attacks, it is becoming more and more important to ensure the security of the network. Intrusion detection, as one of the important technologies to ensure network security, has been widely studied. However, class imbalance leads to a challenging problem, that is, the normal data is much more than the attack data. Class imbalance will lead to the deviation of decision boundary, which makes higher value attack data classification error. In the face of imbalanced data, how to make the classification model classify more effectively is called imbalanced learning problem. In this study, we propose a tabular data sampling method to solve the imbalanced learning problem, which aims to balance the normal samples and attack samples. Firstly, for normal samples, on the premise of minimizing the loss of sample information, the K-nearest neighbor method is used for effective undersampling. Then, we design a tabular auxiliary classifier generative adversarial networks model (TACGAN) for attack sample oversampling. TACGAN model is an extension of ACGAN model. We add two loss functions in the generator to measure the information loss between real data and generated data, which makes TACGAN more suitable for the generation of tabular data. Finally, the normal data after undersampling and the attack data after oversampling are mixed to balance the data. We have carried out verification experiments on three real intrusion detection data sets. Experimental results show that the proposed method achieves excellent results in Accuracy, F1, AUC and Recall. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:240 / 254
页数:15
相关论文
共 58 条
[11]  
Chen H., 2020, ACGAN BASED DATA AUG
[12]  
Chen Y. B., 2020, View, P880
[13]   Unsupervised learning approach for network intrusion detection system using autoencoders [J].
Choi, Hyunseung ;
Kim, Mintae ;
Lee, Gyubok ;
Kim, Wooju .
JOURNAL OF SUPERCOMPUTING, 2019, 75 (09) :5597-5621
[14]   An efficient XGBoost-DNN-based classification model for network intrusion detection system [J].
Devan, Preethi ;
Khare, Neelu .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (16) :12499-12514
[15]   Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data [J].
Ebenuwa, Solomon H. ;
Sharif, Mhd Saeed ;
Alazab, Mamoun ;
Al-Nemrat, Ameer .
IEEE ACCESS, 2019, 7 :24649-24666
[16]   Evolving deep learning architectures for network intrusion detection using a double PSO metaheuristic [J].
Elmasry, Wisam ;
Akbulut, Akhan ;
Zaim, Abdul Halim .
COMPUTER NETWORKS, 2020, 168
[17]   Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning [J].
Engelmann, Justin ;
Lessmann, Stefan .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
[18]   On oversampling imbalanced data with deep conditional generative models [J].
Fajardo, Val Andrei ;
Findlay, David ;
Jaiswal, Charu ;
Yin, Xinshang ;
Houmanfar, Roshanak ;
Xie, Honglei ;
Liang, Jiaxi ;
She, Xichen ;
Emerson, D. B. .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169 (169)
[19]   Network intrusion detection system based on recursive feature addition and bigram technique [J].
Hamed, Tarfa ;
Dara, Rozita ;
Kremer, Stefan C. .
COMPUTERS & SECURITY, 2018, 73 :137-155
[20]   IGAN-IDS: An imbalanced generative adversarial network towards intrusion detection system in ad-hoc networks [J].
Huang, Shuokang ;
Lei, Kai .
AD HOC NETWORKS, 2020, 105