Self-Supervised Latent Representations of Network Flows and Application to Darknet Traffic Classification

被引:3
|
作者
Zakroum, Mehdi [1 ,2 ,3 ,4 ]
Francois, Jerome [2 ]
Ghogho, Mounir [1 ]
Chrisment, Isabelle [3 ,4 ]
机构
[1] Int Univ Rabat, TIC Lab, Rabat 111000, Morocco
[2] Inria, F-54600 Nancy, France
[3] Univ Lorraine, F-54052 Nancy, France
[4] LORIA, F-54506 Nancy, France
关键词
Self-supervised learning; unsupervised learning; graph neural networks; graph auto-encoders; anonymous walk embedding; graph embedding; network flows; network probing; network telescope; Darknet; BOTNET;
D O I
10.1109/ACCESS.2023.3263206
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Characterizing network flows is essential for security operators to enhance their awareness about cyber-threats targeting their networks. The automation of network flow characterization with machine learning has received much attention in recent years. To this aim, raw network flows need to be transformed into structured and exploitable data. In this research work, we propose a method to encode raw network flows into robust latent representations exploitable in downstream tasks. First, raw network flows are transformed into graph-structured objects capturing their topological aspects (packet-wise transitional patterns) and features (used protocols, packets' flags, etc.). Then, using self-supervised techniques like Graph Auto-Encoders and Anonymous Walk Embeddings, each network flow graph is encoded into a latent representation that encapsulates both the structure of the graph and the features of its nodes, while minimizing information loss. This results in semantically-rich and robust representation vectors which can be manipulated by machine learning algorithms to perform downstream network-related tasks. To evaluate our network flow embedding models, we use probing flows captured with two /20 network telescopes and labeled using reports originating from different sources. The experimental results show that the proposed network flow embedding approach allows for reliable darknet probing activity classification. Furthermore, a comparison between our self-supervised approach and a fully-supervised graph convolutional network shows that, in situations with limited labeled data, the downstream classification model that uses the derived latent representations as inputs outperforms the fully-supervised graph convolutional network. There are many applications of this research work in cybersecurity, such as network flow clustering, attack detection and prediction, malware detection, vulnerability exploit analysis, and inference of attacker's intentions.
引用
收藏
页码:90749 / 90765
页数:17
相关论文
共 50 条
  • [41] Contrast and Order Representations for Video Self-supervised Learning
    Hu, Kai
    Shao, Jie
    Liu, Yuan
    Raj, Bhiksha
    Savvides, Marios
    Shen, Zhiqiang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
  • [42] Phonetic Analysis of Self-supervised Representations of English Speech
    Wells, Dan
    Tang, Hao
    Richmond, Korin
    INTERSPEECH 2022, 2022, : 3583 - 3587
  • [43] Learning Action Representations for Self-supervised Visual Exploration
    Oh, Changjae
    Cavallaro, Andrea
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 5873 - 5879
  • [44] Self-supervised graph representations with generative adversarial learning
    Sun, Xuecheng
    Wang, Zonghui
    Lu, Zheming
    Lu, Ziqian
    NEUROCOMPUTING, 2024, 592
  • [45] Federated Self-supervised Speech Representations: Are We There Yet?
    Gao, Yan
    Fernandez-Marques, Javier
    Parcollet, Titouan
    Mehrotra, Abhinav
    Lane, Nicholas D.
    INTERSPEECH 2022, 2022, : 3809 - 3813
  • [46] Self-supervised learning of Dynamic Representations for Static Images
    Song, Siyang
    Sanchez, Enrique
    Shen, Linlin
    Valstar, Michel
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1619 - 1626
  • [47] The Efficacy of Self-Supervised Speech Models as Audio Representations
    Wu, Tung-Yu
    Hsu, Tsu-Yuan
    Li, Chen-An
    Lin, Tzu-Han
    Lee, Hung-yi
    HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 90 - 110
  • [48] Deep Bregman divergence for self-supervised representations learning
    Rezaei, Mina
    Soleymani, Farzin
    Bischl, Bernd
    Azizi, Shekoofeh
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
  • [49] Self-Supervised Learning of Pretext-Invariant Representations
    Misra, Ishan
    van der Maaten, Laurens
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6706 - 6716
  • [50] Explaining Self-Supervised Image Representations with Visual Probing
    Basaj, Dominika
    Oleszkiewicz, Witold
    Sieradzki, Igor
    Gorszczak, Michal
    Rychalska, Barbara
    Trzcinski, Tomasz
    Zielinski, Bartosz
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 592 - 598