Wrangling distributed computing for high- throughput environmental science: An introduction to HTCondor

被引:13
|
作者
Erickson, Richard A. [1 ]
Fienen, Michael N. [2 ]
McCalla, S. Grace [1 ,4 ]
Weiser, Emily L. [1 ]
Bower, Melvin L. [1 ]
Knudson, Jonathan M. [1 ]
Thain, Greg [3 ]
机构
[1] US Geol Survey, Upper Midwest Environm Sci Ctr, La Crosse, WI 54601 USA
[2] US Geol Survey, Wisconsin Water Sci Ctr, Middelton, WI USA
[3] Univ Wisconsin, Dept Comp Sci, 1210 W Dayton St, Madison, WI 53706 USA
[4] Univ Wisconsin, Wisconsin Inst Discovery, Madison, WI USA
关键词
CLIMATE-CHANGE; IMPACTS;
D O I
10.1371/journal.pcbi.1006468
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Biologists and environmental scientists now routinely solve computational problems that were unimaginable a generation ago. Examples include processing geospatial data, analyzing -omics data, and running large-scale simulations. Conventional desktop computing cannot handle these tasks when they are large, and high-performance computing is not always available nor the most appropriate solution for all computationally intense problems. High-throughput computing (HTC) is one method for handling computationally intense research. In contrast to high-performance computing, which uses a single "supercomputer," HTC can distribute tasks over many computers (e.g., idle desktop computers, dedicated servers, or cloud-based resources). HTC facilities exist at many academic and government institutes and are relatively easy to create from commodity hardware. Additionally, consortia such as Open Science Grid facilitate HTC, and commercial entities sell cloud-based solutions for researchers who lack HTC at their institution. We provide an introduction to HTC for biologists and environmental scientists. Our examples from biology and the environmental sciences use HTCondor, an open source HTC system.
引用
收藏
页数:8
相关论文
共 28 条
  • [1] High performance distributed computing: An introduction
    Kowalik, Janusz S.
    Abarbanel, Robert M.
    Studies in Health Technology and Informatics, 2000, 79 : 187 - 194
  • [2] Matchmaking: Distributed resource management for high throughput computing
    Raman, R
    Livny, M
    Solomon, M
    SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING - PROCEEDINGS, 1998, : 140 - 146
  • [3] A high-throughput bioinformatics distributed computing platform
    Keane, TM
    Page, AJ
    McInerney, JO
    Naughton, TJ
    18TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2005, : 377 - 382
  • [4] Solving the Container Explosion Problem for Distributed High Throughput Computing
    Shaffer, Tim
    Hazekamp, Nicholas
    Blomer, Jakob
    Thain, Douglas
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 388 - 398
  • [5] Supporting High-Performance and High-Throughput Computing for Experimental Science
    Huerta E.A.
    Haas R.
    Jha S.
    Neubauer M.
    Katz D.S.
    Computing and Software for Big Science, 2019, 3 (1)
  • [6] mkite: A distributed computing platform for high-throughput materials simulations
    Schwalbe-Koda, Daniel
    COMPUTATIONAL MATERIALS SCIENCE, 2023, 230
  • [7] Continuous Adaptation for High Performance Throughput Computing across Distributed Clusters
    Walker, Edward
    2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2008, : 369 - 375
  • [8] MultiPhyl: a high-throughput phylogenomics webserver using distributed computing
    Keane, Thomas M.
    Naughton, Thomas J.
    McInerney, James O.
    NUCLEIC ACIDS RESEARCH, 2007, 35 : W33 - W37
  • [9] Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
    Pentti Kanerva
    Cognitive Computation, 2009, 1 : 139 - 159
  • [10] Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
    Kanerva, Pentti
    COGNITIVE COMPUTATION, 2009, 1 (02) : 139 - 159