Validation of a Machine Learning-Based IDS Design Framework Using ORNL Datasets for Power System With SCADA

被引:9
作者
Zaman, Marzia [1 ]
Upadhyay, Darshana [1 ]
Lung, Chung-Horng [2 ]
机构
[1] Cistel Technol Inc, Res & Dev Dept, Ottawa, ON K2E 7V7, Canada
[2] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Intrusion detection; machine learning; generative adversarial network; SCADA systems; cyber-attacks; industrial control systems; INTRUSION DETECTION; GRIDS;
D O I
10.1109/ACCESS.2023.3326751
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Supervisory Control and Data Acquisition (SCADA) systems are widely used for remote monitoring and control of industrial processes, such as oil and gas production, power generation, transmission and distribution, and water treatment. Despite the enhanced accessibility, control, and data availability afforded by recent advances in communication technologies, the utilization of these technologies exposes critical infrastructures such as power systems to potential cyber threats. A Machine Learning (ML)-based Intrusion Detection System (IDS) seems promising; however, the development of ML models often requires custom methodologies for data preprocessing and training. This strategic approach is necessary for creating high-performance models that can be robustly evaluated and seamlessly integrated into real-time systems. As a result, we propose an ML-based IDS design framework for a SCADA-based power system incorporating effective modeling aspects, such as dataset preprocessing to ensure accurate representation, data augmentation for achieving a balanced dataset, automated feature selection to reduce dimensionality, and rigorous model training and testing procedures. To substantiate our proposed design framework, we conducted a series of experiments using a publicly available ORNL (Oak Ridge National Laboratory) dataset for a SCADA-based power system. The evaluation process encompasses efficient validation techniques with unseen data. Furthermore, the augmented dataset emerged through the aggregation of readings from four Phasor Measurement Units (PMUs) collected over a specific time span into a unified dataset. Among the assessed classifiers, the Random Forest (RF) model, trained on an augmented and balanced dataset, outperformed others, yielding an F1 score of 94.09% during testing with unseen data.
引用
收藏
页码:118414 / 118426
页数:13
相关论文
共 32 条
[31]   GANBLR: A Tabular Data Generation Model [J].
Zhang, Yishuo ;
Zaidi, Nayyar A. ;
Zhou, Jiahui ;
Li, Gang .
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, :916-925
[32]  
Zhang C, 2016, ASIA-PAC POWER ENERG, P1264, DOI 10.1109/APPEEC.2016.7779696