Generative Deep Learning for Internet of Things Network Traffic Generation

被引:38
作者
Shahid, Mustafizur R. [1 ]
Blanc, Gregory [1 ]
Jmila, Houda [1 ]
Zhang, Zonghua [2 ]
Debar, Herve [1 ]
机构
[1] Telecom SudParis, Inst Polytech Paris, Paris, France
[2] IMT Lille Douai, Inst Mines Telecom, Lille, France
来源
2020 IEEE 25TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2020) | 2020年
基金
欧盟地平线“2020”;
关键词
Deep Learning; Generative Adversarial Network; Autoencoder; Network Security; Internet of Things;
D O I
10.1109/PRDC50213.2020.00018
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rapid development of the Internet of Things (IoT) has prompted a recent interest into realistic IoT network traffic generation. Security practitioners need IoT network traffic data to develop and assess network-based intrusion detection systems (NIDS). Emulating realistic network traffic will avoid the costly physical deployment of thousands of smart devices. From an attacker's perspective, generating network traffic that mimics the legitimate behavior of a device can be useful to evade NIDS. As network traffic data consist of sequences of packets, the problem is similar to the generation of sequences of categorical data, like word by word text generation. Many solutions in the field of natural language processing have been proposed to adapt a Generative Adversarial Network (GAN) to generate sequences of categorical data. In this paper, we propose to combine an autoencoder with a GAN to generate sequences of packet sizes that correspond to bidirectional flows. First, the autoencoder is trained to learn a latent representation of the real sequences of packet sizes. A GAN is then trained on the latent space, to learn to generate latent vectors that can be decoded into realistic sequences. For experimental purposes, bidirectional flows produced by a Google Home Mini are used, and the autoencoder is combined with a Wassertein GAN. Comparison of different network characteristics shows that our proposed approach is able to generate sequences of packet sizes that behave closely to real bidirectional flows. We also show that the synthetic bidirectional flows are close enough to the real ones that they can fool anomaly detectors into labeling them as legitimate.
引用
收藏
页码:70 / 79
页数:10
相关论文
共 30 条
[1]  
[Anonymous], 2017, P INT C LEARN REPR
[2]  
[Anonymous], 2004, 3954 RFC IETF
[3]  
Antonakakis M, 2017, PROCEEDINGS OF THE 26TH USENIX SECURITY SYMPOSIUM (USENIX SECURITY '17), P1093
[4]   A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection [J].
Buczak, Anna L. ;
Guven, Erhan .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (02) :1153-1176
[5]   Network Intrusion Detection for IoT Security Based on Learning Techniques [J].
Chaabouni, Nadia ;
Mosbah, Mohamed ;
Zemmari, Akka ;
Sauvignac, Cyrille ;
Faruki, Parvez .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2019, 21 (03) :2671-2701
[6]  
Charlier J., 2019, ARXIV190809899
[7]  
Cheng A, 2019, 2019 IEEE 10TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), P728, DOI [10.1109/iemcon.2019.8936224, 10.1109/IEMCON.2019.8936224]
[8]  
Donahue D., 2018, ARXIV181006640
[9]   Machine Learning DDoS Detection for Consumer Internet of Things Devices [J].
Doshi, Rohan ;
Apthorpe, Noah ;
Feamster, Nick .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :29-35
[10]  
Foster D., 2019, Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play