Streaming Machine Learning for Supporting Data Prefetching in Modern Data Storage Systems

被引:1
|
作者
Lucas Filho, Edson Ramiro [1 ]
Yang, Lun [2 ]
Fu, Kebo [2 ]
Herodotou, Herodotos [1 ]
机构
[1] Cyprus Univ Technol, Limassol, Cyprus
[2] Huawei Technol Co Ltd, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 1ST WORKSHOP ON AI FOR SYSTEMS, AI4SYS 2023 | 2023年
关键词
multi-tiered storage systems; streaming machine learning; data prefetching; caching policies; tiering policies;
D O I
10.1145/3588982.3603608
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern data storage systems optimize data access by distributing data across multiple storage tiers and caches, based on numerous tiering and caching policies. The policies' decisions, and in particular the ones related to data prefetching, can severely impact the performance of the entire storage system. In recent years, various machine learning algorithms have been employed to model access patterns in complex data storage workloads. Even though data storage systems handle a constantly changing stream of file requests, current approaches continue to train their models offline in a batch-based approach. In this paper, we investigate the use of streaming machine learning to support data prefetching decisions in data storage systems as it introduces various advantages such as high training efficiency, high prediction accuracy, and high adaptability to changing workload patterns. After extracting a representative set of features in an online fashion, streaming machine learning models can be trained and tested while the system is running. To validate our methodology, we present one streaming classification model to predict the next file offset to be read in a file. We assess the model's performance using production traces provided by Huawei Technologies and demonstrate that streaming machine learning is a feasible approach with low memory consumption and minimal training delay, facilitating accurate predictions in real-time.
引用
收藏
页码:7 / 12
页数:6
相关论文
共 14 条
  • [1] Streaming Machine Learning Algorithms with Big Data Systems
    Abeykoon, Vibhatha
    Kamburugamuve, Supun
    Govindrarajan, Kannan
    Wickramasinghe, Pulasthi
    Widanage, Chathura
    Perera, Niranda
    Uyar, Ahmet
    Gunduz, Gurhan
    Akkas, Selahattin
    Von Laszewski, Gregor
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5661 - 5666
  • [2] Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
    Lucas Filho, Edson Ramiro
    Savva, George
    Yang, Lun
    Fu, Kebo
    Shen, Jianqiang
    Herodotou, Herodotos
    FUTURE INTERNET, 2025, 17 (04)
  • [3] DeepPrefetcher: A Deep Learning Framework for Data Prefetching in Flash Storage Devices
    Ganfure, Gaddisa Olani
    Wu, Chun-Feng
    Chang, Yuan-Hao
    Shih, Wei-Kuan
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) : 3311 - 3322
  • [4] Data Prefetching and Eviction Mechanisms of In-Memory Storage Systems Based on Scheduling for Big Data Processing
    Chen, Chien-Hung
    Hsia, Ting-Yuan
    Huang, Yennun
    Kuo, Sy-Yen
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (08) : 1738 - 1752
  • [5] Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems
    Herodotou, Herodotos
    Kakoulli, Elena
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2023, 48 (04):
  • [6] Data block prefetching and caching in a hierarchical storage model
    Vakali, A
    INFORMATION SCIENCES, 2000, 128 (1-2) : 19 - 41
  • [7] LIPA: A Learning-based Indexing and Prefetching Approach for Data Deduplication
    Xu, Guangping
    Tang, Bo
    Lu, Hongli
    Yu, Quan
    Sung, Chi Wan
    2019 35TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST 2019), 2019, : 299 - 310
  • [8] Joint Data Deepening-and-Prefetching for Energy-Efficient Edge Learning
    Kook, Sujin
    Shin, Won-Yong
    Kim, Seong-Lyun
    Ko, Seung-Woo
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 5991 - 5996
  • [9] Energy-Efficient Edge Learning via Joint Data Deepening-and-Prefetching
    Kook, Sujin
    Shin, Won-Yong
    Kim, Seong-Lyun
    Ko, Seung-Woo
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (08) : 9927 - 9942
  • [10] Deep learning based data prefetching in CPU-GPU unified virtual memory
    Long, Xinjian
    Gong, Xiangyang
    Zhang, Bo
    Zhou, Huiyang
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 174 : 19 - 31