Validation of a Machine Learning-Based IDS Design Framework Using ORNL Datasets for Power System With SCADA

被引:9
作者
Zaman, Marzia [1 ]
Upadhyay, Darshana [1 ]
Lung, Chung-Horng [2 ]
机构
[1] Cistel Technol Inc, Res & Dev Dept, Ottawa, ON K2E 7V7, Canada
[2] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Intrusion detection; machine learning; generative adversarial network; SCADA systems; cyber-attacks; industrial control systems; INTRUSION DETECTION; GRIDS;
D O I
10.1109/ACCESS.2023.3326751
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Supervisory Control and Data Acquisition (SCADA) systems are widely used for remote monitoring and control of industrial processes, such as oil and gas production, power generation, transmission and distribution, and water treatment. Despite the enhanced accessibility, control, and data availability afforded by recent advances in communication technologies, the utilization of these technologies exposes critical infrastructures such as power systems to potential cyber threats. A Machine Learning (ML)-based Intrusion Detection System (IDS) seems promising; however, the development of ML models often requires custom methodologies for data preprocessing and training. This strategic approach is necessary for creating high-performance models that can be robustly evaluated and seamlessly integrated into real-time systems. As a result, we propose an ML-based IDS design framework for a SCADA-based power system incorporating effective modeling aspects, such as dataset preprocessing to ensure accurate representation, data augmentation for achieving a balanced dataset, automated feature selection to reduce dimensionality, and rigorous model training and testing procedures. To substantiate our proposed design framework, we conducted a series of experiments using a publicly available ORNL (Oak Ridge National Laboratory) dataset for a SCADA-based power system. The evaluation process encompasses efficient validation techniques with unseen data. Furthermore, the augmented dataset emerged through the aggregation of readings from four Phasor Measurement Units (PMUs) collected over a specific time span into a unified dataset. Among the assessed classifiers, the Random Forest (RF) model, trained on an augmented and balanced dataset, outperformed others, yielding an F1 score of 94.09% during testing with unseen data.
引用
收藏
页码:118414 / 118426
页数:13
相关论文
共 32 条
[1]  
Adhikari U., Industrial control system (ICS) cyber-attack data-sets," data-sets used in the experimentation
[2]  
[Anonymous], 2016, Washington Post
[3]   Using recursive feature elimination in random forest to account for correlated variables in high dimensional data [J].
Darst, Burcu F. ;
Malecki, Kristen C. ;
Engelman, Corinne D. .
BMC GENETICS, 2018, 19
[4]  
Dragos I., 2020 ICS Cybersecurity Year in Review
[5]  
Hink RCB, 2014, INT SYMP RESIL CONTR
[6]  
Karimipour H., 2019, P IEEE CAN C EL COMP, P1
[7]   An Integrated Framework for Privacy-Preserving Based Anomaly Detection for Cyber-Physical Systems [J].
Keshk, Marwa ;
Sitnikova, Elena ;
Moustafa, Nour ;
Hu, Jiankun ;
Khalil, Ibrahim .
IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2021, 6 (01) :66-79
[8]  
Keshk M, 2017, MIL COMM INF SYST CO
[9]   A wavelet-based dynamic mode decomposition for modeling mechanical systems from partial observations [J].
Krishnan, Manu ;
Gugercin, Serkan ;
Tarazaga, Pablo A. .
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 187
[10]   Hacking Power Grids: A Current Problem [J].
Kshetri, Nir ;
Voas, Jeffrey .
COMPUTER, 2017, 50 (12) :91-95