Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

被引：22

作者：

Tagliavini, Giuseppe ^{[1
]}

Cesarini, Daniele ^{[1
]}

Marongiu, Andrea ^{[2
]}

机构：

[1] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, BO, Italy

[2] Univ Bologna, Dept Comp Sci & Engn DISI, I-40126 Bologna, BO, Italy

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2018年 / 29卷 / 09期

基金：

欧盟地平线“2020”;

关键词：

Heterogeneous embedded systems on chip; programmable many-core accelerators; tasking; OpenMp; SYSTEMS; SUPPORT;

D O I：

10.1109/TPDS.2018.2814602

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 x for real benchmarks, which is approximate to 60% higher than what we observe with the original Kalray OpenMP implementation.

引用

页码：2150 / 2163

页数：14

共 4 条

[1] Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs
Vogel, Pirmin
Marongiu, Andrea
Benini, Luca
2015 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2015, : 45 - 54
[2] Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMP
Munera, Adrian
Royuela, Sara
Pressler, Michael
Mackamul, Harald
Ziegenbein, Dirk
Quinones, Eduardo
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 146
[3] X-OpenMP - eXtreme fine-grained tasking using lock-less work stealing
Nookala, Poornima
Chard, Kyle
Raicu, Ioan
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 444 - 458
[4] Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip
Li, Sheng
Kuntz, Shannon
Brockman, Jay B.
Kogge, Peter M.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (07) : 1178 - 1191

← 1 →