Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems

被引：0

作者：

Lucas Filho, Edson Ramiro ^{[1
]}

Savva, George ^{[1
]}

Yang, Lun ^{[2
]}

Fu, Kebo ^{[2
]}

Shen, Jianqiang ^{[2
]}

Herodotou, Herodotos ^{[1
]}

机构：

[1] Cyprus Univ Technol, Dept Elect Engn Comp Engn & Informat, CY-3036 Limassol, Cyprus

[2] Huawei Technol Co Ltd, Shenzhen 518100, Peoples R China

来源：

FUTURE INTERNET | 2025年 / 17卷 / 04期

关键词：

multi-tiered data storage systems; streaming machine learning; workload patterns; MANAGEMENT;

D O I：

10.3390/fi17040170

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system's performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).

引用

页数：37

共 73 条

[1] Improving Storage Systems Using Machine Learning
Akgun, Ibrahim Umit
Aydin, Ali Selman
Burford, Andrew
McNeill, Michael
Arkhangelskiy, Michael
Zadok, Erez
[J]. ACM TRANSACTIONS ON STORAGE, 2023, 19 (01)
[2] The computing continuum: From IoT to the cloud
Al-Dulaimy, Auday
Jansen, Matthijs
Johansson, Bjarne
Trivedi, Animesh
Iosup, Alexandru
Ashjaei, Mohammad
Galletta, Antonino
Kimovski, Dragi
Prodan, Radu
Tserpes, Konstantinos
Kousiouris, George
Giannakos, Chris
Brandic, Ivona
Ali, Nawfal
Bondi, Andre B.
Papadopoulos, Alessandro V.
[J]. INTERNET OF THINGS, 2024, 27
[3] An In-Depth I/O Pattern Analysis in HPC Systems
Bang, Jiwoo
Kim, Chungyong
Wu, Kesheng
Sim, Alex
Byna, Suren
Sung, Hanul
Eom, Hyeonsang
[J]. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 400 - 405
[4] Barry Mariam, 2023, 2023 IEEE 39th International Conference on Data Engineering (ICDE), P3508, DOI 10.1109/ICDE55515.2023.00272
[5] Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
Bera, Rahul
Kanellopoulos, Konstantinos
Nori, Anant V.
Shahroodi, Taha
Subramoney, Sreenivas
Mutlu, Onur
[J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1121 - 1137
[6] Efficient Online Evaluation of Big Data Stream Classifiers
Bifet, Albert
Morales, Gianmarco De Francisci
Read, Jesse
Holmes, Geoff
Pfahringer, Bernhard
[J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 59 - 68
[7] Bifet A, 2010, J MACH LEARN RES, V11, P1601
[8] Braun P., 2019, P INT WORKSH AI ASS
[9] Learning I/O Access Patterns to Improve Prefetching in SSDs
Chakraborttii, Chandranil
Litz, Heiner
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE TRACK, ECML PKDD 2020, PT IV, 2021, 12460 : 427 - 443
[10] IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement Learning
Chang, Juneseo
Doh, Wanju
Moon, Yaebin
Lee, Eojin
Ahn, Jung Ho
[J]. PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,

← 1 2 3 4 5 6 7 8 →