Argobots: A Lightweight Low-Level Threading and Tasking Framework

被引:73
作者
Seo, Sangmin [1 ]
Amer, Abdelhalim [1 ]
Balaji, Pavan [1 ]
Bordage, Cyril [2 ]
Bosilca, George [4 ]
Brooks, Alex [3 ]
Carns, Philip [1 ]
Castello, Adrian [6 ]
Genet, Damien [4 ]
Herault, Thomas [4 ]
Iwasaki, Shintaro [5 ]
Jindal, Prateek [3 ]
Kale, Laxmikant V. [3 ]
Krishnamoorthy, Sriram [7 ]
Lifflander, Jonathan [8 ]
Lu, Huiwei [9 ]
Meneses, Esteban [10 ,11 ]
Snir, Marc [3 ]
Sun, Yanhua [12 ]
Taura, Kenjiro [5 ]
Beckman, Pete [1 ]
机构
[1] Argonne Natl Lab, Lemont, IL 60439 USA
[2] Inria Bordeaux, F-33405 Talence, France
[3] Univ Illinois, Champaign, IL 61820 USA
[4] Univ Tennessee, Knoxville, TN 37996 USA
[5] Univ Tokyo, Bunkyo Ku, Tokyo 1138654, Japan
[6] Univ Jaume 1, Castellon De La Plana 12071, Castellon, Spain
[7] Pacific Northwest Natl Lab, Richland, WA 99354 USA
[8] Sandia Natl Labs, Livermore, CA 94551 USA
[9] Tencent, Shenzhen 518057, Peoples R China
[10] Costa Rica Natl High Technol Ctr, San Jose 10109, Costa Rica
[11] Costa Rica Inst Technol, Cartago 30101, Costa Rica
[12] Google, Mountain View, CA 94043 USA
关键词
Argobots; user-level thread; tasklet; OpenMP; MPI; I/O; interoperability; lightweight; context switch; stackable scheduler;
D O I
10.1109/TPDS.2017.2766062
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.
引用
收藏
页码:512 / 526
页数:15
相关论文
共 56 条
[51]  
Tramm J. R., 2014, P INT C PHYS REACT
[52]   Realm: An Event-Based Low-Level Runtime for Distributed Memory Architectures [J].
Treichler, Sean ;
Bauer, Michael ;
Aiken, Alex .
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, :263-275
[53]  
von Behren R., 2003, Operating Systems Review, V37, P268, DOI 10.1145/1165389.945471
[54]  
Wang Endong, 2014, HighPerformance Computing on the Intel<(R)> Xeon PhiT: How to Fully Exploit MIC Architectures, P167, DOI [DOI 10.1007/978-3-319-06486-47, 10.1007/978-3-319-06486-4_7]
[55]  
Wheeler KB, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P2282
[56]  
Zheng Q., 2015, P 10 PAR DAT STOR WO, P1