Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

被引：0

作者：

Melchert, Jackson ^{[1
]}

Mei, Yuchen ^{[1
]}

Koul, Kalhan ^{[1
]}

Liu, Qiaoyi ^{[1
]}

Horowitz, Mark ^{[1
]}

Raina, Priyanka ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 10期

关键词：

Pipeline processing; Registers; Delays; Routing; Integrated circuit interconnections; Field programmable gate arrays; Switches; Accelerator compilers; application pipelining; coarse-grained reconfigurable arrays (CGRAs); hardware accelerators;

D O I：

10.1109/TCAD.2024.3390542

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

While coarse-grained reconfigurable arrays (CGRAs) have emerged as promising programmable accelerator architectures, they require automatic pipelining of applications during their compilation flow to achieve high performance. Current CGRA compilers either lack pipelining altogether resulting in low application performance, or perform exhaustive pipelining resulting in high power and resource consumption. We address these challenges by proposing Cascade, an end-to-end open-source application compiler for CGRAs that achieves both state-of-the-art performance and fast compilation times. The contributions of this work are: 1) a novel post place-and-route (PnR) application pipelining technique for CGRAs that accounts for interconnect hop delays during pipelining but in a unique way that avoids cyclic scheduling and PnR, 2) a register resource usage optimization technique that leverages the scheduling logic in CGRA memory tiles to minimize the number of register resources used during pipelining, and 3) an automated CGRA timing model generator, an application timing analysis tool, and a large set of existing and novel application pipelining techniques integrated into an end-to-end compilation flow. Cascade achieves 8- 34x lower critical path delay and 7- 190x lower energy- delay product (EDP) across a variety of dense image processing and machine learning workloads, and 3- 5.2x lower critical path delay and 2.5- 5.2x lower EDP on sparse workloads, compared to a compiler without pipelining. Cascade mitigates the performance and energy-efficiency drawbacks of existing CGRA compilers, and enables further research into CGRAs as flexible, yet competitive accelerator architectures.

引用

页码：3055 / 3067

页数：13

共 50 条

[1] Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures
Yin, Shouyi
Liu, Dajiang
Peng, Yu
Liu, Leibo
Wei, Shaojun
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (02) : 507 - 520
[2] Software pipelining for coarse-grained reconfigurable instruction set processors
Barat, F
Jayapala, M
de Beeck, PO
Deconinck, G
ASP-DAC/VLSI DESIGN 2002: 7TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE AND 15TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, 2002, : 338 - 344
[3] OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays
Tan, Cheng
Agostini, Nicolas Bohm
Zhang, Jeff
Minutoli, Marco
Castellana, Vito Giovanni
Xie, Chenhao
Geng, Tong
Li, Ang
Barker, Kevin
Tumeo, Antonino
2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 149 - 155
[4] A Bimodal Scheduler for Coarse-Grained Reconfigurable Arrays
Theocharis, Panagiotis
De Sutter, Bjorn
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (02)
[5] Memory-Aware Application Mapping on Coarse-Grained Reconfigurable Arrays
Kim, Yongjoo
Lee, Jongeun
Shrivastava, Aviral
Yoon, Jonghee
Paek, Yunheung
HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 171 - +
[6] A coarse-grained reconfigurable computing architecture with loop self-pipelining
DOU YongWU GuiMingXU JinHui ZHOU XingMing National Laboratory for Parallel Distributed ProcessingNational University of Defense TechnologyChangsha China
ScienceinChina(SeriesF:InformationSciences), 2009, 52 (04) : 575 - 587
[7] Reusable context pipelining for low power coarse-grained reconfigurable architecture
Kim, Yoonjin
Mahapatra, Rabi N.
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 3379 - 3386
[8] A coarse-grained reconfigurable computing architecture with loop self-pipelining
DOU Yong
Science China(Information Sciences), 2009, (04) : 575 - 587
[9] A coarse-grained reconfigurable computing architecture with loop self-pipelining
Yong Dou
GuiMing Wu
JinHui Xu
XingMing Zhou
Science in China Series F: Information Sciences, 2009, 52 : 575 - 587
[10] The implementation of a coarse-grained reconfigurable architecture with loop self-pipelining
Dou, Yong
Xu, Jinhui
Wu, Guiming
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2007, 4419 : 155 - +

← 1 2 3 4 5 →