An Effective Feature Selection Algorithm for Machine Learning-based Malicious Traffic Detection

被引:0
作者
Fei, Chao [1 ]
Xia, Nian [1 ]
Tsai, Pang-Wei [2 ]
Lu, Yang [1 ]
Pan, Xiaonan [3 ]
Gong, Junli [4 ]
机构
[1] Nanjing Normal Univ, Sch Comp & Elect Informat, Sch Artificial Intelligence, Nanjing, Peoples R China
[2] Natl Cheng Kung Univ, Dept Elect Engn, Tainan, Taiwan
[3] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou, Peoples R China
[4] Beijing Normal Univ Hong Kong Baptist Univ United, Comp Sci & Technol, Zhuhai, Peoples R China
来源
2024 19TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY, ASIAJCIS 2024 | 2024年
关键词
Malicious traffic detection; Feature selection; Machine Learning; Internet of Things;
D O I
10.1109/AsiaJCIS64263.2024.00024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malicious traffic detection is important to defend network attacks. Traditional machine learning-based malicious traffic detection methods aim to improve detection performance and ignore resource consumption. For Internet of Things devices with constrained resources, reducing resource consumption in traffic detection while ensuring the detection accuracy is challenging. In order to solve this problem, this paper proposed an effective feature selection algorithm based on chi-squared test (EFS-CST), which selects the most relevant features to train machine learning (ML) models like random forest, naive bayes, decision tree, and convolutional neural networks. The proposed algorithm was evaluated in terms of accuracy, precision, F1 score, AUROC (area under the receiver operating characteristic curve), AUC-PR (area under the precision-recall curve), model size, training time, and dataset size. Results proved that ML models with EFS-CST could obtain similar detection performance compared to ML models with all features on both UNSW-NB15 and TII-SSRC-23 datasets. The detection performance of some ML models with EFS-CST could even outperform that of ML models without feature selection. The dataset sizes for TII-SSRC-23 and UNSW-NB15 were reduced by 48.8% and 47.8%, respectively. In addition, the model training time for TII-SSRC23 and UNSW-NB15 datasets can be reduced by up to 86.1% and 48.9%, respectively. Finally, the model sizes for TII-SSRC-23 and UNSW-NB15 datasets could be reduced by up to 60.0% and 65.6%, respectively.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 25 条
[1]  
Agoramoorthy Moorthy, 2023, 2023 Intelligent Computing and Control for Engineering and Business Systems (ICCEBS), P1, DOI 10.1109/ICCEBS58601.2023.10449209
[2]   Malware Detection Using Deep Learning and Correlation-Based Feature Selection [J].
Alomari, Esraa Saleh ;
Nuiaa, Riyadh Rahef ;
Alyasseri, Zaid Abdi Alkareem ;
Mohammed, Husam Jasim ;
Sani, Nor Samsiah ;
Esa, Mohd Isrul ;
Musawi, Bashaer Abbuod .
SYMMETRY-BASEL, 2023, 15 (01)
[3]   Towards a generalized hybrid deep learning model with optimized hyperparameters for malicious traffic detection in the Industrial Internet of Things [J].
Babayigit, Bilal ;
Abubaker, Mohammed .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 128
[4]   A Survey of Man In The Middle Attacks [J].
Conti, Mauro ;
Dragoni, Nicola ;
Lesyk, Viktor .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (03) :2027-2051
[5]   Detection of Encrypted Malicious Network Traffic using Machine Learning [J].
De Lucia, Michael J. ;
Cotton, Chase .
MILCOM 2019 - 2019 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2019,
[6]   AN INTRUSION-DETECTION MODEL [J].
DENNING, DE .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (02) :222-232
[7]   Malicious Network Traffic Detection Based on Deep Neural Networks and Association Analysis [J].
Gao, Minghui ;
Ma, Li ;
Liu, Heng ;
Zhang, Zhijun ;
Ning, Zhiyan ;
Xu, Jian .
SENSORS, 2020, 20 (05)
[8]   Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection [J].
Ghorbanzadeh, Omid ;
Blaschke, Thomas ;
Gholamnia, Khalil ;
Meena, Sansar Raj ;
Tiede, Dirk ;
Aryal, Jagannath .
REMOTE SENSING, 2019, 11 (02)
[9]   Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches [J].
Hasan, Mahmudul ;
Islam, Md. Milon ;
Zarif, Md Ishrak Islam ;
Hashem, M. M. A. .
INTERNET OF THINGS, 2019, 7
[10]   TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns for Intrusion Detection [J].
Herzalla, Dania ;
Lunardi, Willian Tessaro ;
Andreoni, Martin .
IEEE ACCESS, 2023, 11 :118577-118594