BoAu: Malicious traffic detection with noise labels based on boundary augmentation

被引:13
作者
Yuan, Qingjun [1 ,2 ]
Liu, Chang [3 ,4 ]
Yu, Wentao [1 ,5 ]
Zhu, Yuefei [1 ,2 ]
Xiong, Gang [3 ,4 ]
Wang, Yongjuan [1 ,2 ]
Gou, Gaopeng [3 ,4 ]
机构
[1] Strateg Support Force Informat Engn Univ, Zhengzhou 450001, Peoples R China
[2] Henan Key Lab Network Cryptog Technol, Zhengzhou 450001, Peoples R China
[3] Chinese Acad Sci, Inst Informat Engn, Beijing 100093, Peoples R China
[4] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China
[5] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
关键词
Malicious traffic detection; Deep learning; Learning with noise labels; Decision boundaries; Encrypted traffic; CLASSIFICATION; NETWORK;
D O I
10.1016/j.cose.2023.103300
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The effectiveness of deep-learning-based malicious traffic detection systems relies on high-quality labeled traffic datasets. However, malicious traffic labeling approaches can easily lead to incorrect labeling, which can have a harmful impact on models. To this end, various methods for learning with noise labels have been proposed. They exclude suspected wrong samples from model updates to ensure accuracy. However, this also removes hard samples, resulting in poor model decision boundaries and the loss of ability to classify hard samples. In this paper, we propose a boundary-augmentation-based approach for malicious traffic identification named BoAu. Unlike other approaches, BoAu treats all samples, including hard sam-ples, equally during training to construct more accurate decision boundaries and thus improve accuracy. Meanwhile, a decision boundary augmentation module is designed to mitigate the impact of mislabeled hard samples on decision boundary generation. The decision boundary augmentation module adaptively adjusts the losses of hard samples based on their distance from the cluster to which their labels belong and other clusters, thus driving the shared feature representation network to fit the true label distribu-tion. We validated BoAu in identifying malicious traffic with noise labels on a dataset covering 22 classes of realistic encrypted malicious traffic. Experimental results showed that even under scenarios with up to 90% noise labels, the classification accuracy was still over 80%, which was better than the state-of-the-art approaches. In addition, we validated the applicability of BoAu on several public datasets, including CIC-IDS-2017 and IoT-23.& COPY; 2023 Published by Elsevier Ltd.
引用
收藏
页数:13
相关论文
共 50 条
[1]   Optimal Feature Selection with Weight Optimised Deep Neural Network for Incremental Learning-Based Intrusion Detection in Fog Environment [J].
Abdussami, Aftab Alam ;
Farooqui, Mohammed Faizan .
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (03)
[2]   Empirical Evaluation of Noise Influence on Supervised Machine Learning Algorithms Using Intrusion Detection Datasets [J].
Al-Gethami, Khalid M. ;
Al-Akhras, Mousa T. ;
Alawairdhi, Mohammed .
SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
[3]   Deciphering malware's use of TLS (without decryption) [J].
Anderson, Blake ;
Paul, Subharthi ;
McGrew, David .
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2018, 14 (03) :195-211
[4]   Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity [J].
Anderson, Blake ;
McGrew, David .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :1723-1732
[5]  
[Anonymous], 2010, P ACM SIGKDD WORKSH, DOI DOI 10.1145/1837885.1837906
[6]  
Chen Pengfei, 2019, P MACHINE LEARNING R, V97
[7]   Incremental Learning for Mobile Encrypted Traffic Classification [J].
Chen, Yige ;
Zang, Tianning ;
Zhang, Yongzheng ;
Zhou, Yuan ;
Ouyang, Linshu ;
Yang, Peng .
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[8]   LITNET-2020: An Annotated Real-World Network Flow Dataset for Network Intrusion Detection [J].
Damasevicius, Robertas ;
Venckauskas, Algimantas ;
Grigaliunas, Sarunas ;
Toldinas, Jevgenijus ;
Morkevicius, Nerijus ;
Aleliunas, Tautvydas ;
Smuikys, Paulius .
ELECTRONICS, 2020, 9 (05)
[9]   Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions [J].
Daniel, Florian ;
Kucherbaev, Pavel ;
Cappiello, Cinzia ;
Benatallah, Boualem ;
Allahbakhsh, Mohammad .
ACM COMPUTING SURVEYS, 2018, 51 (01)
[10]  
Dhanabal L., 2015, Int. J. Adv. Res. Comput. Commun. Eng., V4, P446