Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems

被引:0
作者
Lucas Filho, Edson Ramiro [1 ]
Savva, George [1 ]
Yang, Lun [2 ]
Fu, Kebo [2 ]
Shen, Jianqiang [2 ]
Herodotou, Herodotos [1 ]
机构
[1] Cyprus Univ Technol, Dept Elect Engn Comp Engn & Informat, CY-3036 Limassol, Cyprus
[2] Huawei Technol Co Ltd, Shenzhen 518100, Peoples R China
关键词
multi-tiered data storage systems; streaming machine learning; workload patterns; MANAGEMENT;
D O I
10.3390/fi17040170
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system's performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).
引用
收藏
页数:37
相关论文
共 73 条
  • [1] Improving Storage Systems Using Machine Learning
    Akgun, Ibrahim Umit
    Aydin, Ali Selman
    Burford, Andrew
    McNeill, Michael
    Arkhangelskiy, Michael
    Zadok, Erez
    [J]. ACM TRANSACTIONS ON STORAGE, 2023, 19 (01)
  • [2] The computing continuum: From IoT to the cloud
    Al-Dulaimy, Auday
    Jansen, Matthijs
    Johansson, Bjarne
    Trivedi, Animesh
    Iosup, Alexandru
    Ashjaei, Mohammad
    Galletta, Antonino
    Kimovski, Dragi
    Prodan, Radu
    Tserpes, Konstantinos
    Kousiouris, George
    Giannakos, Chris
    Brandic, Ivona
    Ali, Nawfal
    Bondi, Andre B.
    Papadopoulos, Alessandro V.
    [J]. INTERNET OF THINGS, 2024, 27
  • [3] An In-Depth I/O Pattern Analysis in HPC Systems
    Bang, Jiwoo
    Kim, Chungyong
    Wu, Kesheng
    Sim, Alex
    Byna, Suren
    Sung, Hanul
    Eom, Hyeonsang
    [J]. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 400 - 405
  • [4] Barry Mariam, 2023, 2023 IEEE 39th International Conference on Data Engineering (ICDE), P3508, DOI 10.1109/ICDE55515.2023.00272
  • [5] Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
    Bera, Rahul
    Kanellopoulos, Konstantinos
    Nori, Anant V.
    Shahroodi, Taha
    Subramoney, Sreenivas
    Mutlu, Onur
    [J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1121 - 1137
  • [6] Efficient Online Evaluation of Big Data Stream Classifiers
    Bifet, Albert
    Morales, Gianmarco De Francisci
    Read, Jesse
    Holmes, Geoff
    Pfahringer, Bernhard
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 59 - 68
  • [7] Bifet A, 2010, J MACH LEARN RES, V11, P1601
  • [8] Braun P., 2019, P INT WORKSH AI ASS
  • [9] Learning I/O Access Patterns to Improve Prefetching in SSDs
    Chakraborttii, Chandranil
    Litz, Heiner
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE TRACK, ECML PKDD 2020, PT IV, 2021, 12460 : 427 - 443
  • [10] IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement Learning
    Chang, Juneseo
    Doh, Wanju
    Moon, Yaebin
    Lee, Eojin
    Ahn, Jung Ho
    [J]. PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,