Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection

被引:127
作者
Ding, Hongwei [1 ,2 ]
Chen, Leiyang [1 ]
Dong, Liang [1 ]
Fu, Zhongwang [1 ]
Cui, Xiaohui [1 ,2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Wuhan Univ, Key Lab Aerosp Informat Secur & Trusted Comp, Minist Educ, Wuhan, Peoples R China
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 131卷
基金
国家重点研发计划;
关键词
Intrusion detection; Class imbalance; K-nearest neighbor; Generative adversarial network; TACGAN; LEARNING APPROACH; IDS;
D O I
10.1016/j.future.2022.01.026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the continuous emergence of various network attacks, it is becoming more and more important to ensure the security of the network. Intrusion detection, as one of the important technologies to ensure network security, has been widely studied. However, class imbalance leads to a challenging problem, that is, the normal data is much more than the attack data. Class imbalance will lead to the deviation of decision boundary, which makes higher value attack data classification error. In the face of imbalanced data, how to make the classification model classify more effectively is called imbalanced learning problem. In this study, we propose a tabular data sampling method to solve the imbalanced learning problem, which aims to balance the normal samples and attack samples. Firstly, for normal samples, on the premise of minimizing the loss of sample information, the K-nearest neighbor method is used for effective undersampling. Then, we design a tabular auxiliary classifier generative adversarial networks model (TACGAN) for attack sample oversampling. TACGAN model is an extension of ACGAN model. We add two loss functions in the generator to measure the information loss between real data and generated data, which makes TACGAN more suitable for the generation of tabular data. Finally, the normal data after undersampling and the attack data after oversampling are mixed to balance the data. We have carried out verification experiments on three real intrusion detection data sets. Experimental results show that the proposed method achieves excellent results in Accuracy, F1, AUC and Recall. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:240 / 254
页数:15
相关论文
共 58 条
[1]  
ALFRHAN A.A., 2020, P INT C COMP INF TEC, P1
[2]  
Altaha M., 2021, J COMMUN, V16
[3]   GAN augmentation to deal with imbalance in imaging-based intrusion detection [J].
Andresini, Giuseppina ;
Appice, Annalisa ;
De Rose, Luca ;
Malerba, Donato .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 123 (123) :108-127
[4]  
[Anonymous], 2023, 2009 IEEE S COMP INT
[5]  
[Anonymous], 2018, Inf. Sci, DOI DOI 10.1016/J.INS.2018.06.056
[6]   Network intrusion detection using multi-architectural modular deep neural network [J].
Atefinia, Ramin ;
Ahmadi, Mahmood .
JOURNAL OF SUPERCOMPUTING, 2021, 77 (04) :3571-3593
[7]   Resampling imbalanced data for network intrusion detection datasets [J].
Bagui, Sikha ;
Li, Kunqi .
JOURNAL OF BIG DATA, 2021, 8 (01)
[8]   I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems [J].
Bedi, Punam ;
Gupta, Neha ;
Jindal, Vinita .
APPLIED INTELLIGENCE, 2021, 51 (02) :1133-1151
[9]   A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU [J].
Bhattacharya, Sweta ;
Krishnan, Siva Rama S. ;
Maddikunta, Praveen Kumar Reddy ;
Kaluri, Rajesh ;
Singh, Saurabh ;
Gadekallu, Thippa Reddy ;
Alazab, Mamoun ;
Tariq, Usman .
ELECTRONICS, 2020, 9 (02)
[10]   Network intrusion detection based on random forest and support vector machine [J].
Chang, Yaping ;
Li, Wei ;
Yang, Zhongming .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, :635-638