Optimizing High-Performance Computing Systems for Biomedical Workloads

被引:0
作者
Kovatch, Patricia [1 ]
Gai, Lili [2 ]
Cho, Hyung Min [2 ]
Fluder, Eugene [2 ]
Jiang, Dansha [2 ]
机构
[1] Icahn Sch Med Mt Sinai, Genet & Genom Sci, New York, NY 10029 USA
[2] Icahn Sch Med Mt Sinai, Off Dean, New York, NY 10029 USA
来源
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020) | 2020年
关键词
high performance computing; computational biology; genomics; system optimization; scheduling; parallel file systems; cloud technologies; sustainability;
D O I
10.1109/IPDPSW50202.2020.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The productivity of computational biologists is limited by the speed of their workflows and subsequent overall job throughput. Because most biomedical researchers are focused on better understanding scientific phenomena rather than developing and optimizing code, a computing and data system implemented in an adventitious and/or non-optimized manner can impede the progress of scientific discovery. In our experience, most computational, life-science applications do not generally leverage the full capabilities of high-performance computing, so tuning a system for these applications is especially critical. To optimize a system effectively, systems staff must understand the effects of the applications on the system. Effective stewardship of the system includes an analysis of the impact of the applications on the compute cores, file system, resource manager and queuing policies. The resulting improved system design, and enactment of a sustainability plan, help to enable a long-term resource for productive computational and data science. We present a case study of a typical biomedical computational workload at a leading academic medical center supporting over $100 million per year in computational biology research. Over the past eight years, our high-performance computing system has enabled over 900 biomedical publications in four major areas: genetics and population analysis, gene expression, machine learning, and structural and chemical biology. We have upgraded the system several times in response to trends, actual usage, and user feedback. Major components crucial to this evolution include scheduling structure and policies, memory size, compute type and speed, parallel file system capabilities, and deployment of cloud technologies. We evolved a 70 teraflop machine to a 1.4 petaflop machine in seven years and grew our user base nearly 10-fold. For long-term stability and sustainability, we established a chargeback fee structure. Our overarching guiding principle for each progression has been to increase scientific throughput and enable enhanced scientific fidelity with minimal impact to existing user workflows or code. This highly-constrained system optimization has presented unique challenges, leading us to adopt new approaches to provide constructive pathways forward. We share our practical strategies resulting from our ongoing growth and assessments.
引用
收藏
页码:183 / 192
页数:10
相关论文
共 22 条
[1]   The PsychENCODE project [J].
Akbarian, Schahram ;
Liu, Chunyu ;
Knowles, James A. ;
Vaccarino, Flora M. ;
Farnham, Peggy J. ;
Crawford, Gregory E. ;
Jaffe, Andrew E. ;
Pinto, Dalila ;
Dracheva, Stella ;
Geschwind, Daniel H. ;
Mill, Jonathan ;
Nairn, Angus C. ;
Abyzov, Alexej ;
Pochareddy, Sirisha ;
Prabhakar, Shyam ;
Weissman, Sherman ;
Sullivan, Patrick F. ;
State, Matthew W. ;
Weng, Zhiping ;
Peters, Mette A. ;
White, Kevin P. ;
Gerstein, Mark B. ;
Amiri, Anahita ;
Armoskus, Chris ;
Ashley-Koch, Allison E. ;
Bae, Taejeong ;
Beckel-Mitchener, Andrea ;
Berman, Benjamin P. ;
Coetzee, Gerhard A. ;
Coppola, Gianfilippo ;
Francoeur, Nancy ;
Fromer, Menachem ;
Gao, Robert ;
Grennan, Kay ;
Herstein, Jennifer ;
Kavanagh, David H. ;
Ivanov, Nikolay A. ;
Jiang, Yan ;
Kitchen, Robert R. ;
Kozlenkov, Alexey ;
Kundakovic, Marija ;
Li, Mingfeng ;
Li, Zhen ;
Liu, Shuang ;
Mangravite, Lara M. ;
Mattei, Eugenio ;
Markenscoff-Papadimitriou, Eirene ;
Navarro, Fabio C. P. ;
North, Nicole ;
Omberg, Larsson .
NATURE NEUROSCIENCE, 2015, 18 (12) :1707-1712
[2]   Workflow optimization of performance and quality of service for bioinformatics application in high performance computing [J].
Al-Ali, Rashid ;
Kathiresan, Nagarajan ;
El Anbari, Mohammed ;
Schendel, Eric R. ;
Abu Zaid, Tariq .
JOURNAL OF COMPUTATIONAL SCIENCE, 2016, 15 :3-10
[3]   Role of systems pharmacology in understanding drug adverse events [J].
Berger, Seth I. ;
Iyengar, Ravi .
WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE, 2011, 3 (02) :129-135
[4]   Acute Porphyrias in the USA: Features of 108 Subjects from Porphyrias Consortium [J].
Bonkovsky, Herbert L. ;
Maddukuri, Vinaya C. ;
Yazici, Cemal ;
Anderson, Karl E. ;
Bissell, D. Montgomery ;
Bloomer, Joseph R. ;
Phillips, John D. ;
Naik, Hetanshi ;
Peter, Inga ;
Baillargeon, Gwen ;
Bossi, Krista ;
Gandolfo, Laura ;
Light, Carrie ;
Bishop, David ;
Desnick, Robert J. .
AMERICAN JOURNAL OF MEDICINE, 2014, 127 (12) :1233-1241
[5]  
Bumgardner VKC, 2016 IEEE 18 INT C H
[6]   The Autism Sequencing Consortium: Large-Scale, High-Throughput Sequencing in Autism Spectrum Disorders [J].
Buxbaum, Joseph D. ;
Daly, Mark J. ;
Devlin, Bernie ;
Lehner, Thomas ;
Roeder, Kathryn ;
State, Matthew W. .
NEURON, 2012, 76 (06) :1052-1056
[7]  
Gelb B, 2013, CONGENITAL HEART DIS, V112
[8]  
Godhandaraman T, 2017 INT C ALG METH
[9]  
Hodes R, EXPERT OPINION THERA, V20, P389
[10]   CommonMind Consortium provides transcriptomic and epigenomic data for Schizophrenia and Bipolar Disorder [J].
Hoffman, Gabriel E. ;
Bendl, Jaroslav ;
Voloudakis, Georgios ;
Montgomery, Kelsey S. ;
Sloofman, Laura ;
Wang, Ying-Chih ;
Shah, Hardik R. ;
Hauberg, Mads E. ;
Johnson, Jessica S. ;
Girdhar, Kiran ;
Song, Lingyun ;
Fullard, John F. ;
Kramer, Robin ;
Hahn, Chang-Gyu ;
Gur, Raquel ;
Marenco, Stefano ;
Lipska, Barbara K. ;
Lewis, David A. ;
Haroutunian, Vahram ;
Hemby, Scott ;
Sullivan, Patrick ;
Akbarian, Schahram ;
Chess, Andrew ;
Buxbaum, Joseph D. ;
Crawford, Greg E. ;
Domenici, Enrico ;
Devlin, Bernie ;
Sieberts, Solveig K. ;
Peters, Mette A. ;
Roussos, Panos .
SCIENTIFIC DATA, 2019, 6 (1)