Exploring Accelerator and Parallel Graph Algorithmic Choices for Temporal Graphs

被引：26

作者：

Rehman, Akif ^{[1
]}

Ahmad, Masab ^{[1
]}

Khan, Omer ^{[1
]}

机构：

[1] Univ Connecticut, Storrs, CT 06269 USA

来源：

PROCEEDINGS OF THE ELEVENTH INTERNATIONAL WORKSHOP ON PROGRAMMING MODELS AND APPLICATIONS FOR MULTICORES AND MANYCORES, PMAM 2020 | 2020年

基金：

美国国家科学基金会;

关键词：

multicores; graph algorithms; static graphs; temporal graphs; performance scaling;

D O I：

10.1145/3380536.3380540

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Many real-world systems utilize graphs that are time-varying in nature, where edges appear and disappear with respect to time. Moreover, the weights of different edges are also a function of time. Various conventional graph algorithms, such as single source shortest path (SSSP) have been developed for time-varying graphs. However, these algorithms are sequential in nature and their parallel counterparts are largely overlooked. On the other hand, parallel algorithms for static graphs are implemented as ordered and unordered variants. Unordered implementations do not enforce local or global order for processing tasks in parallel, but incur redundant task processing to converge their solutions. These implementations expose parallelism at the cost of high redundant work. Relax-ordered implementations maintain local order through per-core priority queues to reduce the amount of redundant work, while exposing parallelism. Finally, strict-ordered implementations achieve the work efficiency of sequential version by enforcing a global order at the expense of high thread synchronizations. These parallel implementations are adopted for temporal graphs to explore the choices that provide optimal performance on different parallel accelerators. This work shows that selecting the optimal parallel implementation extracts geometric performance gain of 46.38% on Intel Xeon-40 core and 20.30% on NVidia GTX-1080 GPU. It is also shown that optimal implementation choices for temporal graphs are not always the same as their respective static graphs.

引用

页码：61 / 70

页数：10

共 38 条

[1] HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators [J].

Ahmad, Masab ;

Dogan, Halit ;

Michael, Christopher J. ;

Khan, Omer .

2019 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2019, :268-281

[2] CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores [J].

Ahmad, Masab ;

Hijaz, Farrukh ;

Shi, Qingchuan ;

Khan, Omer .

2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, :44-55

[3]

[Anonymous], 2008, Journal on Data Semantics XI

[4] Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server [J].

Beamer, Scott ;

Asanovic, Krste ;

Patterson, David .

2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, :56-65

[5]

Bojarski Mariusz, 2016, arXiv

[6] ChronoGraph: Enabling Temporal Graph Traversals for Efficient Information Diffusion Analysis over Time [J].

Byun, Jaewook ;

Woo, Sungpil ;

Kim, Daeyoung .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (03) :424-437

[7] The Chronnectome: Time-Varying Connectivity Networks as the Next Frontier in fMRI Data Discovery [J].

Calhoun, Vince D. ;

Miller, Robyn ;

Pearlson, Godfrey ;

Adali, Tulay .

NEURON, 2014, 84 (02) :262-274

[8]

Casteigts A, 2012, INT J PARALLEL EMERG, V27, P387, DOI 10.1007/978-3-642-22450-8_27

[9]

Che S, 2013, I S WORKL CHAR PROC, P185, DOI 10.1109/IISWC.2013.6704684

[10]

Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797

← 1 2 3 4 →