NetKD: Towards Resource-Efficient Encrypted Traffic Classification Using Knowledge Distillation for Language Models

被引：0

作者：

Ma, Jiaji ^{[1
]}

Li, Xiangge ^{[1
]}

Luo, Hong ^{[1
]}

Sun, Yan ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Encrypted Traffic Classification; Knowledge Distillation; Transformer; BERT;

D O I：

10.1109/CSCWD61410.2024.10580837

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Encrypted traffic classification is a challenging task characterized by its invisibility of content, class imbalance, and limited number of labeled samples. In current research, the application of pre-trained language models has achieved significant success. However, employing language models for encrypted traffic classification on resource-constrained devices requires reducing model size and inference cost while preserving accuracy. In this paper, we proposed an unsupervised, task-agnostic, transformer-based language model knowledge distillation method for encrypted traffic data, called NetKD. It leverages Multi-Head Self-Attention Relation Alignment, Masked BURST Model, and Same-origin BURST Prediction tasks, to effectively transfer the byte-level contextual knowledge about encrypted traffic data learned by a large-scale "teacher" BERT model, to a compact "student" NetKD-BERT model. The student model only requires a small amount of supervised data for fine-tuning then can be applied to downstream tasks. Experiments show that compared to the state-of-the-art baseline model, our model uses only 4.61% of the parameters, achieves on average 99.10% of its F1 score in 4 downstream tasks, with a classification speed of 0.93ms/packet. Its memory footprint is as low as 7.33% of the baseline model's, leading to a significant 14.3 times faster inference speed.

引用

页码：3011 / 3016

页数：6

共 21 条

[1]

Al-Naami K., 2016, P 32 ANN C COMP SEC

[2] Independent comparison of popular DPI tools for traffic classification [J].

Bujlow, Tomasz ;

Carela-Espanol, Valentin ;

Barlet-Ros, Pere .

COMPUTER NETWORKS, 2015, 76 :75-89

[3]

Chang Liu, 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, P1171, DOI 10.1109/INFOCOM.2019.8737507

[4] Analysing environmental impact of large-scale events in public spaces with cross-domain multimodal data fusion [J].

De, Suparna ;

Wang, Wei ;

Zhou, Yuchao ;

Perera, Charith ;

Moessner, Klaus ;

Alraja, Mansour Naser .

COMPUTING, 2021, 103 (09) :1959-1981

[5]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[6]

Draper-Gil Gerard, 2016, ICISSP 2016. 2nd International Conference on Information Systems Security and Privacy. Proceedings, P407

[7]

He H. Y., 2020, 2020 ITU KALEIDOSCOP

[8]

Hinton G., 2015, ARXIV150302531

[9]

Jiao XQ, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P4163

[10] ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification [J].

Lin, Xinjie ;

Gang Xiong ;

Gou, Gaopeng ;

Zhen Li ;

Shi, Junzheng ;

Jing Yu .

PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, :633-642

← 1 2 3 →