Data-Driven ICS Network Simulation for Synthetic Data Generation

被引:1
作者
Kim, Minseo [1 ]
Jeon, Seungho [2 ]
Cho, Jake [3 ]
Gong, Seonghyeon [3 ]
机构
[1] Univ North Texas, Dept Comp Sci & Engn, Denton, TX 76205 USA
[2] Gachon Univ, Dept Comp Engn Smart Secur, Seongnam Si 1342, Gyeonggi Do, South Korea
[3] IIT, Dept Elect & Comp Engn, Chicago, IL 60616 USA
关键词
industrial control system (ICS); synthetic data generation; data-driven simulation; machine learning; cybersecurity; INTRUSION DETECTION;
D O I
10.3390/electronics13101920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Industrial control systems (ICSs) are integral to managing and optimizing processes in various industries, including manufacturing, power generation, and more. However, the scarcity of widely adopted ICS datasets hampers research efforts in areas like optimization and security. This scarcity arises due to the substantial cost and technical expertise required to create physical ICS environments. In response to these challenges, this paper presents a groundbreaking approach to generating synthetic ICS data through a data-driven ICS network simulation. We circumvent the need for expensive hardware by recreating the entire ICS environment in software. Moreover, rather than manually replicating the control logic of ICS components, we leverage existing data to autonomously generate control logic. The core of our method involves the stochastic setting of setpoints, which introduces randomness into the generated data. Setpoints serve as target values for controlling the operation of the ICS process. This approach enables us to augment existing ICS datasets and cater to the data requirements of machine learning-based ICS intrusion detection systems and other data-driven applications. Our simulated ICS environment employs virtualized containers to mimic the behavior of real-world PLCs and SCADA systems, while control logic is deduced from publicly available ICS datasets. Setpoints are generated probabilistically to ensure data diversity. Experimental results validate the fidelity of our synthetic data, emphasizing their ability to closely replicate temporal and statistical characteristics of real-world ICS networks. In conclusion, this innovative data-driven ICS network simulation offers a cost-effective and scalable solution for generating synthetic ICS data. It empowers researchers in the field of ICS optimization and security with diverse, realistic datasets, furthering advancements in this critical domain. Future work may involve refining the simulation model and exploring additional applications for synthetic ICS data.
引用
收藏
页数:15
相关论文
共 29 条
[1]   OpenPLC: An IEC 61,131-3 compliant open source industrial controller for cyber security research [J].
Alves, Thiago ;
Morris, Thomas .
COMPUTERS & SECURITY, 2018, 78 :364-379
[2]  
[Anonymous], About us
[3]  
[Anonymous], 2013, INT C LEARNING REPRE
[4]   A new perspective towards the development of robust data-driven intrusion detection for industrial control systems [J].
Ayodeji, Abiodun ;
Liu, Yong-kuo ;
Chao, Nan ;
Yang, Li-qun .
NUCLEAR ENGINEERING AND TECHNOLOGY, 2020, 52 (12) :2687-2698
[5]   An Evaluation of Machine Learning Methods to Detect Malicious SCADA Communications [J].
Beaver, Justin M. ;
Borges-Hink, Raymond C. ;
Buckner, Mark. A. .
2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, :54-59
[6]  
Berndt D.J., 1994, P KDD WORKSH SEATTL, V10, P359, DOI DOI 10.5555/3000850.3000887
[7]  
Craggs B., 2019, IET In Living in the Internet of Things, P1
[8]   SynSys: A Synthetic Data Generation System for Healthcare Applications [J].
Dahmen, Jessamyn ;
Cook, Diane .
SENSORS, 2019, 19 (05)
[9]  
Docker Inc, About us
[10]  
Esteban C., 2017, ARXIV