Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems

被引:31
|
作者
Oral, Sarp [1 ]
Simmons, James [1 ]
Hill, Jason [1 ]
Leverman, Dustin [1 ]
Wang, Feiyi [1 ]
Ezell, Matt [1 ]
Miller, Ross [1 ]
Fuller, Douglas [1 ]
Gunasekaran, Raghul [1 ]
Kim, Youngjae [1 ]
Gupta, Saurabh [1 ]
Tiwari, Devesh [1 ]
Vazhkudai, Sudharshan S. [1 ]
Rogers, James H. [1 ]
Dillow, David [1 ]
Shipman, Galen M. [1 ]
Bland, Arthur S. [1 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge Leadership Comp Facil, Oak Ridge, TN 37830 USA
来源
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2014年
关键词
D O I
10.1109/SC.2014.23
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, bench-marking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.
引用
收藏
页码:217 / 228
页数:12
相关论文
共 50 条
  • [1] A Novel Data-Centric Programming Model for Large-Scale Parallel Systems
    Talia, Domenico
    Trunfio, Paolo
    Marozzo, Fabrizio
    Belcastro, Loris
    Garcia-Blas, Javier
    del Rio, David
    Couvee, Philippe
    Goret, Gael
    Vincent, Lionel
    Fernandez-Pena, Alberto
    Martin de Blas, Daniel
    Nardi, Mirko
    Pizzuti, Teresa
    Spataru, Adrian
    Justyna, Marek
    EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS, 2020, 11997 : 452 - 463
  • [2] Best Practices for Deploying a CMDB in large-scale Environments
    Keller, Alexander
    Subramanian, Suraj
    2009 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2009) VOLS 1 AND 2, 2009, : 732 - 745
  • [3] A Data-Centric Approach for Analyzing Large-Scale Deep Learning Applications
    Vineet, S. Sai
    Joseph, Natasha Meena
    Korgaonkar, Kunal
    Paul, Arnab K.
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, ICDCN 2023, 2023, : 282 - 283
  • [4] Lessons Learned from Large-Scale Refactoring
    Wright, Hyrum K.
    2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), 2019, : 366 - 366
  • [5] RecSysOps: Best Practices for Operating a Large-Scale Recommender System
    Saberian, Mohammad
    Basilico, Justin
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 590 - 591
  • [6] Lessons learned and best practices derived from environmental monitoring at a large-scale CO2 injection project
    Leroux, Kerryanne M.
    Azzolina, Nicholas A.
    Glazewski, Kyle A.
    Kalenze, Nicholas S.
    Botnen, Barry W.
    Kovacevich, Justin T.
    Abongwa, Pride T.
    Thompson, Jeffrey S.
    Zacher, Erick J.
    Hamling, John A.
    Gorecki, Charles D.
    INTERNATIONAL JOURNAL OF GREENHOUSE GAS CONTROL, 2018, 78 : 254 - 270
  • [7] Lessons Learned from Developing and Deploying a Large-Scale Employer Name Normalization System for Online Recruitment
    Liu, Qiaoling
    Chao, Josh
    Mahoney, Thomas
    Chern, Alan
    Min, Chris
    Javed, Faizan
    Jijkoun, Valentin
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 556 - 565
  • [8] A Data-Centric Storage Approach for Monitoring System of Large-Scale Smart Grid
    Wang, Yan
    Deng, Qingxu
    Liu, Wei
    Song, Baoyan
    2012 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING (WICOM), 2012,
  • [9] Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review
    Tiejun Cheng
    Ming Hao
    Takako Takeda
    Stephen H. Bryant
    Yanli Wang
    The AAPS Journal, 2017, 19 : 1264 - 1275
  • [10] Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review
    Cheng, Tiejun
    Hao, Ming
    Takeda, Takako
    Bryant, Stephen H.
    Wang, Yanli
    AAPS JOURNAL, 2017, 19 (05): : 1264 - 1275