Self-Supervised Latent Representations of Network Flows and Application to Darknet Traffic Classification

被引：3

作者：

Zakroum, Mehdi ^{[1
,2
,3
,4
]}

Francois, Jerome ^{[2
]}

Ghogho, Mounir ^{[1
]}

Chrisment, Isabelle ^{[3
,4
]}

机构：

[1] Int Univ Rabat, TIC Lab, Rabat 111000, Morocco

[2] Inria, F-54600 Nancy, France

[3] Univ Lorraine, F-54052 Nancy, France

[4] LORIA, F-54506 Nancy, France

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Self-supervised learning; unsupervised learning; graph neural networks; graph auto-encoders; anonymous walk embedding; graph embedding; network flows; network probing; network telescope; Darknet; BOTNET;

D O I：

10.1109/ACCESS.2023.3263206

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Characterizing network flows is essential for security operators to enhance their awareness about cyber-threats targeting their networks. The automation of network flow characterization with machine learning has received much attention in recent years. To this aim, raw network flows need to be transformed into structured and exploitable data. In this research work, we propose a method to encode raw network flows into robust latent representations exploitable in downstream tasks. First, raw network flows are transformed into graph-structured objects capturing their topological aspects (packet-wise transitional patterns) and features (used protocols, packets' flags, etc.). Then, using self-supervised techniques like Graph Auto-Encoders and Anonymous Walk Embeddings, each network flow graph is encoded into a latent representation that encapsulates both the structure of the graph and the features of its nodes, while minimizing information loss. This results in semantically-rich and robust representation vectors which can be manipulated by machine learning algorithms to perform downstream network-related tasks. To evaluate our network flow embedding models, we use probing flows captured with two /20 network telescopes and labeled using reports originating from different sources. The experimental results show that the proposed network flow embedding approach allows for reliable darknet probing activity classification. Furthermore, a comparison between our self-supervised approach and a fully-supervised graph convolutional network shows that, in situations with limited labeled data, the downstream classification model that uses the derived latent representations as inputs outperforms the fully-supervised graph convolutional network. There are many applications of this research work in cybersecurity, such as network flow clustering, attack detection and prediction, malware detection, vulnerability exploit analysis, and inference of attacker's intentions.

引用

页码：90749 / 90765

页数：17

共 50 条

[41] Contrast and Order Representations for Video Self-supervised Learning
Hu, Kai
Shao, Jie
Liu, Yuan
Raj, Bhiksha
Savvides, Marios
Shen, Zhiqiang
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
[42] Phonetic Analysis of Self-supervised Representations of English Speech
Wells, Dan
Tang, Hao
Richmond, Korin
INTERSPEECH 2022, 2022, : 3583 - 3587
[43] Learning Action Representations for Self-supervised Visual Exploration
Oh, Changjae
Cavallaro, Andrea
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 5873 - 5879
[44] Self-supervised graph representations with generative adversarial learning
Sun, Xuecheng
Wang, Zonghui
Lu, Zheming
Lu, Ziqian
NEUROCOMPUTING, 2024, 592
[45] Federated Self-supervised Speech Representations: Are We There Yet?
Gao, Yan
Fernandez-Marques, Javier
Parcollet, Titouan
Mehrotra, Abhinav
Lane, Nicholas D.
INTERSPEECH 2022, 2022, : 3809 - 3813
[46] Self-supervised learning of Dynamic Representations for Static Images
Song, Siyang
Sanchez, Enrique
Shen, Linlin
Valstar, Michel
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1619 - 1626
[47] The Efficacy of Self-Supervised Speech Models as Audio Representations
Wu, Tung-Yu
Hsu, Tsu-Yuan
Li, Chen-An
Lin, Tzu-Han
Lee, Hung-yi
HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 90 - 110
[48] Deep Bregman divergence for self-supervised representations learning
Rezaei, Mina
Soleymani, Farzin
Bischl, Bernd
Azizi, Shekoofeh
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
[49] Self-Supervised Learning of Pretext-Invariant Representations
Misra, Ishan
van der Maaten, Laurens
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6706 - 6716
[50] Explaining Self-Supervised Image Representations with Visual Probing
Basaj, Dominika
Oleszkiewicz, Witold
Sieradzki, Igor
Gorszczak, Michal
Rychalska, Barbara
Trzcinski, Tomasz
Zielinski, Bartosz
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 592 - 598

← 1 2 3 4 5 →