Lightweight asynchronous scheduling in heterogeneous reconfigurable systems

被引:3
|
作者
Rodriguez, Andres [1 ]
Navarro, Angeles [1 ]
Nikov, Kris [2 ]
Nunez-Yanez, Jose [2 ]
Gran, Ruben [3 ]
Gracia, Dario Suarez [3 ]
Asenjo, Rafael [1 ]
机构
[1] Univ Malaga, Dept Comp Architecture, Malaga, Spain
[2] Univ Bristol, Dept Elect & Elect Engn, Bristol, Avon, England
[3] Univ Zaragoza, Comp Architecture Grp, Zaragoza, Spain
基金
英国工程与自然科学研究理事会;
关键词
Heterogeneous architecture; FPGA; Heterogeneous scheduling; Throughput model; Energy efficiency;
D O I
10.1016/j.sysarc.2022.102398
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance.Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Online Task Scheduling for Heterogeneous Reconfigurable Systems
    Zhou, Xuegong
    Liang, Liang
    Wang, Ying
    Peng, Chenglian
    COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN IV, 2008, 5236 : 596 - +
  • [2] An Energy Aware Scheduling for Reconfigurable Heterogeneous Systems
    Ghribi, Ines
    Ben Abdallah, Riadh
    Khalgui, Mohamed
    ICSOFT: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2017, : 171 - 177
  • [3] Task scheduling for heterogeneous reconfigurable computers
    Ahmadinia, A
    Bobda, C
    Koch, D
    Majer, M
    Teich, J
    SBCCI2004:17TH SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN, PROCEEDINGS, 2004, : 22 - 27
  • [4] Heterogeneous reconfigurable systems
    Rabacy, JM
    Abnous, A
    Ichikawa, Y
    Seno, K
    Wan, M
    SIPS 97 - 1997 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 1997, : 24 - 34
  • [5] Evaluation and Proposal of a Lightweight Reconfigurable Accelerator for Heterogeneous Multicore
    Silva Junior, Francisco Carlos
    Silva, Ivan Saraiva
    Jacobi, Ricardo Pezzuol
    IEEE LATIN AMERICA TRANSACTIONS, 2021, 19 (04) : 559 - 566
  • [6] An Approximation Algorithm for Scheduling on Heterogeneous Reconfigurable Resources
    Nahapetian, Ani
    Brisk, Philip
    Ghiasi, Soheil
    Sarrafzadeh, Majid
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2009, 9 (01) : 5
  • [7] Asynchronous wrapper for heterogeneous systems
    Bormann, DS
    Cheung, PYK
    INTERNATIONAL CONFERENCE ON COMPUTER DESIGN - VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 1997, : 307 - 314
  • [8] Task Modules Partitioning, Scheduling and Floorplanning for Partially Dynamically Reconfigurable Systems with Heterogeneous Resources
    Ding, Bo
    Huang, Jinglei
    Wang, Junpeng
    Xu, Qi
    Chen, Song
    Kang, Yi
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2023, 28 (06)
  • [9] A Lightweight and High-Throughput Asynchronous Message Bus for Communication in Multi-Core Heterogeneous Systems
    Zeng, Qingyang
    Wang, Jingyu
    Cong, Jiaxu
    Shang, Delong
    IEEE ACCESS, 2024, 12 : 48555 - 48569
  • [10] Lightweight Framework for Reliable Job Scheduling in Heterogeneous Clouds
    Abdulazeez, Muhammed
    Garncarek, Pawel
    Wong, Prudence W. H.
    2017 26TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN 2017), 2017,