A Federated Learning Approach for Anomaly Detection in High Performance Computing

被引:4
作者
Farooq, Emmen [1 ]
Borghesi, Andrea [1 ]
机构
[1] Univ Bologna, DISI, Bologna, Italy
来源
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2023年
关键词
Federated Learning; High Performance Computing; Anomaly Detection; Machine Learning;
D O I
10.1109/ICTAI59109.2023.00079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High Performance Computing (HPC) systems are complex machines that need to be operated at their maximum potential to recoup their investment cost and to mitigate their environmental impact. Anomalous conditions hindering the correct usage of the supercomputing nodes are a significant problem. Hence, the development of automated anomaly detection techniques remains a vital area of research. Machine Learning (ML) models demonstrated to be good at detecting anomalies on individual nodes. However, the potential of combining data from multiple computing nodes and associated ML models has not been explored yet. Federated Learning (FL) can address this shortcoming, by allowing individual models to learn from each other. This paper applies FL to improve the performance of anomaly detection models for HPC systems. The approach has been validated on data from an actual supercomputer, obtaining an improvement in the average f-score from 0.31 to 0.84. We also show how FL can significantly shorten the data collection period needed to create a training set. While ML models need, on average, 4.5 months of training data, FL reduces the training set size to 1.2 weeks - a 15x reduction.
引用
收藏
页码:496 / 500
页数:5
相关论文
共 14 条
  • [11] Personalized federated learning framework for network traffic anomaly detection
    Pei, Jiaming
    Zhong, Kaiyang
    Jan, Mian Ahmad
    Li, Jinhai
    [J]. COMPUTER NETWORKS, 2022, 209
  • [12] AnoFed: Adaptive anomaly detection for digital health using transformer-based federated learning and support vector data description
    Raza, Ali
    Tran, Kim Phuc
    Koehl, Ludovic
    Li, Shujun
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 121
  • [13] Framing Network Flow for Anomaly Detection Using Image Recognition and Federated Learning
    Toldinas, Jevgenijus
    Venckauskas, Algimantas
    Liutkevicius, Agnius
    Morkevicius, Nerijus
    [J]. ELECTRONICS, 2022, 11 (19)
  • [14] Light-weight federated learning-based anomaly detection for time-series data in industrial control systems
    Truong, Huong Thu
    Ta, Bac Phuong
    Le, Quang Anh
    Nguyen, Dan Minh
    Le, Cong Thanh
    Nguyen, Hoang Xuan
    Do, Ha Thu
    Nguyen, Hung Tai
    Tran, Kim Phuc
    [J]. COMPUTERS IN INDUSTRY, 2022, 140