Scaling Irregular Applications through Data Aggregation and Software Multithreading

被引:17
作者
Morani, Alessandro [1 ]
Tumeo, Antonino [1 ]
Chavarria-Miranda, Daniel [1 ]
Villa, Oreste [2 ]
Valero, Mateo [3 ]
机构
[1] Pacific NW Natl Lab, Richland, WA 99352 USA
[2] NVIDA, Santa Clara, CA USA
[3] Univ Politecn Catalanya, Barcelona Supercomputing Ctr, Barcelona, Spain
来源
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM | 2014年
关键词
COMMUNICATION; PERFORMANCE;
D O I
10.1109/IPDPS.2014.117
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emerging applications in areas such as bioinformatics, data analytics, semantic databases and knowledge discovery employ datasets from tens to hundreds of terabytes. Currently, only distributed memory clusters have enough aggregate space to enable in-memory processing of datasets of this size. However, in addition to large sizes, the data structures used by these new application classes are usually characterized by unpredictable and fine-grained accesses: i.e., they present an irregular behavior. Traditional commodity clusters, instead, exploit cache-based processor and high-bandwidth networks optimized for locality, regular computation and bulk communication. For these reasons, irregular applications are inefficient on these systems, and require custom, hand-coded optimizations to provide scaling in both performance and size. Lightweight software multithreading, which enables tolerating data access latencies by overlapping network communication with computation, and aggregation, which allows reducing overheads and increasing bandwidth utilization by coalescing fine-grained network messages, are key techniques that can speed up the performance of large scale irregular applications on commodity clusters. In this paper we describe GMT (Global Memory and Threading), a runtime system library that couples software multithreading and message aggregation together with a Partitioned Global Address Space (PGAS) data model to enable higher performance and scaling of irregular applications on multi-node systems. We present the architecture of the runtime, explaining how it is designed around these two critical techniques. We show that irregular applications written using our runtime can outperform, even by orders of magnitude, the corresponding applications written using other programming models that do not exploit these techniques.
引用
收藏
页数:10
相关论文
共 25 条
  • [1] [Anonymous], P 2010 ACM SIGMOD IN, DOI [DOI 10.1145/1807167.1807184, 10.1145/1807167.1807184]
  • [2] Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2
    Bader, David A.
    Madduri, Kamesh
    [J]. 2006 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2006, : 523 - 530
  • [3] Bonachea D., 2002, TECHNICAL REPORT
  • [4] Catalyurek Umit V., 2011, 2011 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, P1971, DOI 10.1109/IPDPS.2011.360
  • [5] CHAKRABARTI S, 1993, SIGPLAN NOTICES, V28, P169, DOI 10.1145/173284.155350
  • [6] Parallel programmability and the Chapel language
    Chamberlain, B. L.
    Callahan, D.
    Zima, H. P.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) : 291 - 312
  • [7] X10: An object-oriented approach to non-uniform cluster computing
    Charles, P
    Donawa, C
    Ebcioglu, K
    Grothoff, C
    Kielstra, A
    von Praun, C
    Saraswat, V
    Sarkar, V
    [J]. ACM SIGPLAN NOTICES, 2005, 40 (10) : 519 - 538
  • [8] Chen J., 2013, Synergistic challenges in data-intensive science and exascale computing
  • [9] Cong G., 2009, PGAS 09
  • [10] COMMUNICATION OPTIMIZATIONS FOR IRREGULAR SCIENTIFIC COMPUTATIONS ON DISTRIBUTED-MEMORY ARCHITECTURES
    DAS, R
    UYSAL, M
    SALTZ, J
    HWANG, YS
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (03) : 462 - 478