Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

被引：0

作者：

Melchert, Jackson ^{[1
]}

Mei, Yuchen ^{[1
]}

Koul, Kalhan ^{[1
]}

Liu, Qiaoyi ^{[1
]}

Horowitz, Mark ^{[1
]}

Raina, Priyanka ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 10期

关键词：

Pipeline processing; Registers; Delays; Routing; Integrated circuit interconnections; Field programmable gate arrays; Switches; Accelerator compilers; application pipelining; coarse-grained reconfigurable arrays (CGRAs); hardware accelerators;

D O I：

10.1109/TCAD.2024.3390542

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

While coarse-grained reconfigurable arrays (CGRAs) have emerged as promising programmable accelerator architectures, they require automatic pipelining of applications during their compilation flow to achieve high performance. Current CGRA compilers either lack pipelining altogether resulting in low application performance, or perform exhaustive pipelining resulting in high power and resource consumption. We address these challenges by proposing Cascade, an end-to-end open-source application compiler for CGRAs that achieves both state-of-the-art performance and fast compilation times. The contributions of this work are: 1) a novel post place-and-route (PnR) application pipelining technique for CGRAs that accounts for interconnect hop delays during pipelining but in a unique way that avoids cyclic scheduling and PnR, 2) a register resource usage optimization technique that leverages the scheduling logic in CGRA memory tiles to minimize the number of register resources used during pipelining, and 3) an automated CGRA timing model generator, an application timing analysis tool, and a large set of existing and novel application pipelining techniques integrated into an end-to-end compilation flow. Cascade achieves 8- 34x lower critical path delay and 7- 190x lower energy- delay product (EDP) across a variety of dense image processing and machine learning workloads, and 3- 5.2x lower critical path delay and 2.5- 5.2x lower EDP on sparse workloads, compared to a compiler without pipelining. Cascade mitigates the performance and energy-efficiency drawbacks of existing CGRA compilers, and enables further research into CGRAs as flexible, yet competitive accelerator architectures.

引用

页码：3055 / 3067

页数：13

共 50 条

[21] Verification of Coarse-Grained Reconfigurable Arrays through Random Test Programs
Egger, Bernhard
Song, Eunjin
Lee, Hochan
Shin, Daeyoung
ACM SIGPLAN NOTICES, 2018, 53 (06) : 76 - 88
[22] Design Evaluation of OpenCL Compiler Framework for Coarse-Grained Reconfigurable Arrays
Kim, Hee-Seok
Ahn, Minwook
Stratton, John A.
Hwu, Wen-mei W.
2012 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT'12), 2012, : 313 - 320
[23] Interconnect architectures for modulo-scheduled coarse-grained reconfigurable arrays
Wilton, SJE
Kafafi, N
Mei, BF
Vernalde, S
2004 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2004, : 33 - 40
[24] Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays
Dimitroulakos, G
Galanis, MD
Goutis, CE
16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 161 - 168
[25] Loop self-pipelining onto coarse-grained reconfigurable architecture for embedded media optimization
Wang, Dawei
Li, Sikun
Dou, Yong
Journal of Information and Computational Science, 2008, 5 (06): : 2422 - 2432
[26] Tuning coarse-grained reconfigurable architectures towards an application domain
Oliveira, Julio
Schweizer, Thomas
Oppold, Tobias
Kuhn, Tommy
Rosenstiel, Wolfgang
ReConFig 2006: Proceedings of the 2006 IEEE International Conference on Reconfigurable Computing and FPGA's, 2006, : 71 - 77
[27] Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
De Sutter, Bjorn
Coene, Paul
Aa, Tom Vander
Mei, Bingfeng
ACM SIGPLAN NOTICES, 2008, 43 (07) : 151 - 160
[28] Fast placement and routing by extending coarse-grained reconfigurable arrays with Omega Networks
Ferreira, Ricardo S.
Cardoso, Joao M. P.
Damiany, Alex
Vendramini, Julio
Teixeira, Tiago
JOURNAL OF SYSTEMS ARCHITECTURE, 2011, 57 (08) : 761 - 777
[29] A Novel Top to Bottom Toolchain For Generating Virtual Coarse-Grained Reconfigurable Arrays
Fricke, Florian
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 267 - 268
[30] Double-Pumping the Interconnect for Area Reduction in Coarse-Grained Reconfigurable Arrays
Wang, Xinyuan
Yu, Tianyi
Hsiao, Hsuan
Anderson, Jason
2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 242 - 249

← 1 2 3 4 5 →