Energy Efficient Hardware Loop Based Optimization for CGRAs

被引：1

作者：

Sunny, Chilankamol ^{[1
]}

Das, Satyajit ^{[1
]}

Martin, Kevin J. M. ^{[2
]}

Coussy, Philippe ^{[2
]}

机构：

[1] IIT Palakkad, Palakkad, Kerala, India

[2] Univ Bretagne Sud, UMR 6285, Lab STICC, F-56100 Lorient, France

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2022年 / 94卷 / 09期

关键词：

Coarse grained reconfigurable array (CGRA); Loop optimization; Hardware loop; Loop unrolling; POWER;

D O I：

10.1007/s11265-022-01760-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Research interest and industry investment in edge computing solutions have increased dramatically in recent years. Consequent quest for balanced performance, energy efficiency and flexibility bestowed surging popularity on Coarse Grained Reconfigurable Array (CGRA) architectures. To further improve the performance and energy efficiency, several hardware and software-based loop optimizations are adopted for CGRAs. In this paper, we propose a centralized hardware-based loop optimization technique to achieve better area and energy results compared to the previously implemented distributed version. Without incurring any performance degradation, area overhead against the reference architecture is reduced down to 1.5% for a 4x2 CGRA configuration. A maximum of 47.3% and an arithmetic mean of 27.2% reduction in energy consumption is attained by the centralized version of hardware loop compared to the baseline model employing software loop. Furthermore, the paper explores the co-existence of CGRA-specific hardware and software optimizations and their impact on loop efficiencies. Enhanced results are obtained by coupling loop unrolling with centralized hardware loop support. The combination allows achieving up to 68.7% reduction in energy consumption and 5.46x speed-up against the baseline model with no optimizations applied.

引用

页码：895 / 912

页数：18

共 27 条

[1] Instruction buffering to reduce power in processors for signal processing
Bajwa, RS
Hiraki, M
Kojima, H
Gorny, DJ
Nitta, K
Shridhar, A
Seki, K
Sasaki, K
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 1997, 5 (04) : 417 - 424
[2] Balasubramanian M, 2018, DES AUT TEST EUROPE, P1069, DOI 10.23919/DATE.2018.8342170
[3] Insight into tiles generated by means of a correction technique
Bielecki, Wlodzimierz
Skotnicki, Piotr
[J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (05) : 2665 - 2690
[4] Das S., 2018, THESIS LORIENT
[5] An Energy-Efficient Integrated Programmable Array Accelerator and Compilation Flow for Near-Sensor Ultralow Power Processing
Das, Satyajit
Martin, Kevin J. M.
Rossi, Davide
Coussy, Philippe
Benini, Luca
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (06) : 1095 - 1108
[6] A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics
Das, Satyajit
Martin, Kevin J. M.
Coussy, Philippe
Rossi, Davide
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[7] Das S, 2017, ASIA S PACIF DES AUT, P127, DOI 10.1109/ASPDAC.2017.7858308
[8] Dragomir O.S., 2010, ARCHITECTURES COMPIL, P6164
[9] Near-Threshold RISC-VCore With DSP Extensions for Scalable IoT Endpoint Devices
Gautschi, Michael
Schiavone, Pasquale Davide
Traber, Andreas
Loi, Igor
Pullini, Antonio
Rossi, Davide
Flamand, Eric
Gurkaynak, Frank K.
Benini, Luca
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2017, 25 (10) : 2700 - 2713
[10] SNAFU: An Ultra-Low-Power, Energy-Minimal CGRA-Generation Framework and Architecture
Gobieski, Graham
Atli, Ahmet Oguz
Mai, Kenneth
Lucia, Brandon
Beckmann, Nathan
[J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 1027 - 1040

← 1 2 3 →