NetKD: Towards Resource-Efficient Encrypted Traffic Classification Using Knowledge Distillation for Language Models

被引:0
作者
Ma, Jiaji [1 ]
Li, Xiangge [1 ]
Luo, Hong [1 ]
Sun, Yan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Encrypted Traffic Classification; Knowledge Distillation; Transformer; BERT;
D O I
10.1109/CSCWD61410.2024.10580837
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Encrypted traffic classification is a challenging task characterized by its invisibility of content, class imbalance, and limited number of labeled samples. In current research, the application of pre-trained language models has achieved significant success. However, employing language models for encrypted traffic classification on resource-constrained devices requires reducing model size and inference cost while preserving accuracy. In this paper, we proposed an unsupervised, task-agnostic, transformer-based language model knowledge distillation method for encrypted traffic data, called NetKD. It leverages Multi-Head Self-Attention Relation Alignment, Masked BURST Model, and Same-origin BURST Prediction tasks, to effectively transfer the byte-level contextual knowledge about encrypted traffic data learned by a large-scale "teacher" BERT model, to a compact "student" NetKD-BERT model. The student model only requires a small amount of supervised data for fine-tuning then can be applied to downstream tasks. Experiments show that compared to the state-of-the-art baseline model, our model uses only 4.61% of the parameters, achieves on average 99.10% of its F1 score in 4 downstream tasks, with a classification speed of 0.93ms/packet. Its memory footprint is as low as 7.33% of the baseline model's, leading to a significant 14.3 times faster inference speed.
引用
收藏
页码:3011 / 3016
页数:6
相关论文
共 21 条
[1]  
Al-Naami K., 2016, P 32 ANN C COMP SEC
[2]   Independent comparison of popular DPI tools for traffic classification [J].
Bujlow, Tomasz ;
Carela-Espanol, Valentin ;
Barlet-Ros, Pere .
COMPUTER NETWORKS, 2015, 76 :75-89
[3]  
Chang Liu, 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, P1171, DOI 10.1109/INFOCOM.2019.8737507
[4]   Analysing environmental impact of large-scale events in public spaces with cross-domain multimodal data fusion [J].
De, Suparna ;
Wang, Wei ;
Zhou, Yuchao ;
Perera, Charith ;
Moessner, Klaus ;
Alraja, Mansour Naser .
COMPUTING, 2021, 103 (09) :1959-1981
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Draper-Gil Gerard, 2016, ICISSP 2016. 2nd International Conference on Information Systems Security and Privacy. Proceedings, P407
[7]  
He H. Y., 2020, 2020 ITU KALEIDOSCOP
[8]  
Hinton G., 2015, ARXIV150302531
[9]  
Jiao XQ, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P4163
[10]   ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification [J].
Lin, Xinjie ;
Gang Xiong ;
Gou, Gaopeng ;
Zhen Li ;
Shi, Junzheng ;
Jing Yu .
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, :633-642