Symmetric active/active metadata service for high availability parallel file systems

被引:7
|
作者
He, Xubin [1 ]
Ou, Li
Engelmann, Christian [2 ]
Chen, Xin [1 ]
Scott, Stephen L. [2 ]
机构
[1] Tennessee Technol Univ, Dept Elect & Comp Engn, Cookeville, TN 38505 USA
[2] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN 37831 USA
基金
美国国家科学基金会;
关键词
Metadata management; Fault tolerance; High availability; Parallel file systems; Group communication; BROADCAST; TIME;
D O I
10.1016/j.jpdc.2009.08.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending oil the replication technique used. In addition, the replication overhead for multiple metadata services call be very high. The research in this paper targets the symmetric active/active replication model, which uses Multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not Cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability call be achieved with all acceptable performance trade-off using the symmetric active/active metadata service Solution. (C) 2009 Elsevier Inc. All rights reserved
引用
收藏
页码:961 / 973
页数:13
相关论文
共 15 条
  • [1] Symmetric Active/Active High Availability for High-Performance Computing System Services
    Engelmann, Christian
    Scott, Stephen L.
    Leangsuksun, Chokchai
    He, Xubin
    JOURNAL OF COMPUTERS, 2006, 1 (08) : 43 - 54
  • [2] The State of the Art of Metadata Managements in Large-Scale Distributed File Systems - Scalability, Performance and Availability
    Dai, Hao
    Wang, Yang
    Kent, Kenneth B.
    Zeng, Lingfang
    Xu, Chengzhong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3850 - 3869
  • [3] MetaFlow: A Scalable Metadata Lookup Service for Distributed File Systems in Data Centers
    Sun, Peng
    Wen, Yonggang
    Duong Nguyen Binh Ta
    Xie, Haiyong
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (02) : 203 - 216
  • [4] A Highly Reliable Metadata Service for Large-Scale Distributed File Systems
    Zhou, Jiang
    Chen, Yong
    Wang, Weiping
    He, Shuibing
    Meng, Dan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (02) : 374 - 392
  • [5] COMET: Client-Oriented METadata Service for Highly Available Distributed File Systems
    Xue, Ruini
    Ao, Lixiang
    Guan, Zhongyang
    2015 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2015, : 154 - 161
  • [6] Availability improvement of active/standby cluster systems
    Park, K
    Kim, S
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 1936 - 1941
  • [7] Active-Standby for High-Availability in FaaS
    Bouizem, Yasmina
    Dib, Djawida
    Parlavantzas, Nikos
    Morin, Christine
    PROCEEDINGS OF THE 2020 SIXTH INTERNATIONAL WORKSHOP ON SERVERLESS COMPUTING (WOSC '20), 2020, : 31 - 36
  • [8] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
    Zhi-Guang Chen
    Yu-Bo Liu
    Yong-Feng Wang
    Yu-Tong Lu
    Journal of Computer Science and Technology, 2021, 36 : 44 - 55
  • [9] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
    Chen, Zhi-Guang
    Liu, Yu-Bo
    Wang, Yong-Feng
    Lu, Yu-Tong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (01) : 44 - 55
  • [10] A Study of Failure Recovery and Logging of High-Performance Parallel File Systems
    Han, Runzhou
    Gatla, Om Rameshwar
    Zheng, Mai
    Cao, Jinrui
    Zhang, Di
    Dai, Dong
    Chen, Yong
    Cook, Jonathan
    ACM TRANSACTIONS ON STORAGE, 2022, 18 (02)