Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

被引:0
|
作者
Melchert, Jackson [1 ]
Mei, Yuchen [1 ]
Koul, Kalhan [1 ]
Liu, Qiaoyi [1 ]
Horowitz, Mark [1 ]
Raina, Priyanka [1 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Pipeline processing; Registers; Delays; Routing; Integrated circuit interconnections; Field programmable gate arrays; Switches; Accelerator compilers; application pipelining; coarse-grained reconfigurable arrays (CGRAs); hardware accelerators;
D O I
10.1109/TCAD.2024.3390542
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
While coarse-grained reconfigurable arrays (CGRAs) have emerged as promising programmable accelerator architectures, they require automatic pipelining of applications during their compilation flow to achieve high performance. Current CGRA compilers either lack pipelining altogether resulting in low application performance, or perform exhaustive pipelining resulting in high power and resource consumption. We address these challenges by proposing Cascade, an end-to-end open-source application compiler for CGRAs that achieves both state-of-the-art performance and fast compilation times. The contributions of this work are: 1) a novel post place-and-route (PnR) application pipelining technique for CGRAs that accounts for interconnect hop delays during pipelining but in a unique way that avoids cyclic scheduling and PnR, 2) a register resource usage optimization technique that leverages the scheduling logic in CGRA memory tiles to minimize the number of register resources used during pipelining, and 3) an automated CGRA timing model generator, an application timing analysis tool, and a large set of existing and novel application pipelining techniques integrated into an end-to-end compilation flow. Cascade achieves 8- 34x lower critical path delay and 7- 190x lower energy- delay product (EDP) across a variety of dense image processing and machine learning workloads, and 3- 5.2x lower critical path delay and 2.5- 5.2x lower EDP on sparse workloads, compared to a compiler without pipelining. Cascade mitigates the performance and energy-efficiency drawbacks of existing CGRA compilers, and enables further research into CGRAs as flexible, yet competitive accelerator architectures.
引用
收藏
页码:3055 / 3067
页数:13
相关论文
共 50 条
  • [21] Verification of Coarse-Grained Reconfigurable Arrays through Random Test Programs
    Egger, Bernhard
    Song, Eunjin
    Lee, Hochan
    Shin, Daeyoung
    ACM SIGPLAN NOTICES, 2018, 53 (06) : 76 - 88
  • [22] Design Evaluation of OpenCL Compiler Framework for Coarse-Grained Reconfigurable Arrays
    Kim, Hee-Seok
    Ahn, Minwook
    Stratton, John A.
    Hwu, Wen-mei W.
    2012 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT'12), 2012, : 313 - 320
  • [23] Interconnect architectures for modulo-scheduled coarse-grained reconfigurable arrays
    Wilton, SJE
    Kafafi, N
    Mei, BF
    Vernalde, S
    2004 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2004, : 33 - 40
  • [24] Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays
    Dimitroulakos, G
    Galanis, MD
    Goutis, CE
    16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 161 - 168
  • [25] Loop self-pipelining onto coarse-grained reconfigurable architecture for embedded media optimization
    Wang, Dawei
    Li, Sikun
    Dou, Yong
    Journal of Information and Computational Science, 2008, 5 (06): : 2422 - 2432
  • [26] Tuning coarse-grained reconfigurable architectures towards an application domain
    Oliveira, Julio
    Schweizer, Thomas
    Oppold, Tobias
    Kuhn, Tommy
    Rosenstiel, Wolfgang
    ReConFig 2006: Proceedings of the 2006 IEEE International Conference on Reconfigurable Computing and FPGA's, 2006, : 71 - 77
  • [27] Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
    De Sutter, Bjorn
    Coene, Paul
    Aa, Tom Vander
    Mei, Bingfeng
    ACM SIGPLAN NOTICES, 2008, 43 (07) : 151 - 160
  • [28] Fast placement and routing by extending coarse-grained reconfigurable arrays with Omega Networks
    Ferreira, Ricardo S.
    Cardoso, Joao M. P.
    Damiany, Alex
    Vendramini, Julio
    Teixeira, Tiago
    JOURNAL OF SYSTEMS ARCHITECTURE, 2011, 57 (08) : 761 - 777
  • [29] A Novel Top to Bottom Toolchain For Generating Virtual Coarse-Grained Reconfigurable Arrays
    Fricke, Florian
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 267 - 268
  • [30] Double-Pumping the Interconnect for Area Reduction in Coarse-Grained Reconfigurable Arrays
    Wang, Xinyuan
    Yu, Tianyi
    Hsiao, Hsuan
    Anderson, Jason
    2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 242 - 249