Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

被引:22
|
作者
Tagliavini, Giuseppe [1 ]
Cesarini, Daniele [1 ]
Marongiu, Andrea [2 ]
机构
[1] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, BO, Italy
[2] Univ Bologna, Dept Comp Sci & Engn DISI, I-40126 Bologna, BO, Italy
基金
欧盟地平线“2020”;
关键词
Heterogeneous embedded systems on chip; programmable many-core accelerators; tasking; OpenMp; SYSTEMS; SUPPORT;
D O I
10.1109/TPDS.2018.2814602
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 x for real benchmarks, which is approximate to 60% higher than what we observe with the original Kalray OpenMP implementation.
引用
收藏
页码:2150 / 2163
页数:14
相关论文
共 4 条
  • [1] Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs
    Vogel, Pirmin
    Marongiu, Andrea
    Benini, Luca
    2015 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2015, : 45 - 54
  • [2] Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMP
    Munera, Adrian
    Royuela, Sara
    Pressler, Michael
    Mackamul, Harald
    Ziegenbein, Dirk
    Quinones, Eduardo
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 146
  • [3] X-OpenMP - eXtreme fine-grained tasking using lock-less work stealing
    Nookala, Poornima
    Chard, Kyle
    Raicu, Ioan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 444 - 458
  • [4] Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip
    Li, Sheng
    Kuntz, Shannon
    Brockman, Jay B.
    Kogge, Peter M.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (07) : 1178 - 1191