Lightweight asynchronous scheduling in heterogeneous reconfigurable systems

被引:3
|
作者
Rodriguez, Andres [1 ]
Navarro, Angeles [1 ]
Nikov, Kris [2 ]
Nunez-Yanez, Jose [2 ]
Gran, Ruben [3 ]
Gracia, Dario Suarez [3 ]
Asenjo, Rafael [1 ]
机构
[1] Univ Malaga, Dept Comp Architecture, Malaga, Spain
[2] Univ Bristol, Dept Elect & Elect Engn, Bristol, Avon, England
[3] Univ Zaragoza, Comp Architecture Grp, Zaragoza, Spain
基金
英国工程与自然科学研究理事会;
关键词
Heterogeneous architecture; FPGA; Heterogeneous scheduling; Throughput model; Energy efficiency;
D O I
10.1016/j.sysarc.2022.102398
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance.Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Scheduling and operator control in reconfigurable assembly systems
    Gyulai, David
    Kadar, Botond
    Monostori, Laszlo
    MANUFACTURING SYSTEMS 4.0, 2017, 63 : 459 - 464
  • [22] Online hybrid task scheduling in reconfigurable systems
    Liang, Liang
    Zhou, Xue-Gong
    Wang, Ying
    Peng, Cheng-Lian
    PROCEEDINGS OF THE 2007 11TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOLS 1 AND 2, 2007, : 1072 - +
  • [23] Efficient task scheduling for runtime reconfigurable systems
    Fazlali, Mahmood
    Sabeghi, Mojtaba
    Zakerolhosseini, Ali
    Bertels, Koen
    JOURNAL OF SYSTEMS ARCHITECTURE, 2010, 56 (11) : 623 - 632
  • [24] Reconfigurable discrete wavelet transform processor for heterogeneous reconfigurable multimedia systems
    Tseng, PC
    Huang, CT
    Chen, LG
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2005, 41 (01): : 35 - 47
  • [25] Reconfigurable Discrete Wavelet Transform Processor for Heterogeneous Reconfigurable Multimedia Systems
    Po-Chih Tseng
    Chao-Tsung Huang
    Liang-Gee Chen
    Journal of VLSI signal processing systems for signal, image and video technology, 2005, 41 : 35 - 47
  • [26] Heterogeneous Systems with Reconfigurable Neuromorphic Computing Accelerators
    Li, Sicheng
    Liu, Xiaoxiao
    Mao, Menglie
    Li, Hai
    Chen, Yiran
    Li, Boxun
    Wang, Yu
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 125 - 128
  • [27] Optimizing Matrix Multiplication on Heterogeneous Reconfigurable Systems
    Zhuo, Ling
    Prasanna, Viktor K.
    PARALLEL COMPUTING: ARCHITECTURES, ALGORITHMS AND APPLICATIONS, 2008, 15 : 561 - 568
  • [28] Building Heterogeneous Reconfigurable Systems Using Threads
    Agron, Jason
    Andrews, David
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 435 - 438
  • [29] HARNESS: Heterogeneous adaptable reconfigurable networked systems
    Dongarra, J
    Fagg, G
    Geist, A
    Kohl, JA
    Papadopoulos, PM
    Scott, SL
    Sunderam, V
    Magliardi, M
    SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING - PROCEEDINGS, 1998, : 358 - 359
  • [30] MosaicSim: A Lightweight, Modular Simulator for Heterogeneous Systems
    Matthews, Opeoluwa
    Manocha, Aninda
    Giri, Davide
    Orenes-Vera, Marcelo
    Tureci, Esin
    Sorensen, Tyler
    Ham, Tae Jun
    Aragon, Juan L.
    Carloni, Luca P.
    Martonosi, Margaret
    2020 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2020, : 136 - 148