Enabling Voltage Over-Scaling in Multiplierless DSP Architectures via Algorithm-Hardware Co-Design

被引：0

作者：

Eleftheriadis, Charalampos ^{[1
]}

Chatzitsompanis, Georgios ^{[1
]}

Karakonstantis, Georgios ^{[1
]}

机构：

[1] Queens Univ Belfast, Inst Elect Commun & Informat Technol, Belfast BT3 9DT, North Ireland

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2024年 / 32卷 / 02期

基金：

欧盟地平线“2020”;

关键词：

Approximate discrete cosine transform (DCT); approximate fast Fourier transform (FFT); multiplierless; time-multiplexing; voltage over-scaling (VOS); LOW-POWER; TRANSFORM; RESILIENT; METHODOLOGY;

D O I：

10.1109/TVLSI.2023.3308607

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The design of low-power digital signal processing (DSP) architectures have gained a lot of attention due to their use in a variety of smart edge applications and portable devices. Recent efforts have focused on the replacement of power-hungry multipliers with various approximation frameworks such as multiplierless architectures that require only a few bit-shifts, additions and/or multiplexers when the multiplicand coefficients are known a priori. However, most existing multiplierless and approximation-based works have not been combined systematically with voltage over-scaling (VOS), which is considered one of the most effective power saving approaches, while the few that have tried, were applied to specific case studies with custom modifications. In this article, we are proposing a generic optimization framework that not only minimizes the hardware units in any time-multiplexed directed acyclic graph (TM-DAG) multiplier but also allows the reliable completion of most operations and the avoidance of random timing errors under VOS. This is achieved by synthesizing alternative coefficients that approximate well the original ones, while also activating shorter critical paths. As a result when VOS is applied, minor quality degradation occurs due to the coefficient approximations which are deterministic by design, while the gained timing slack of the new multiplicands allow us to reduce the supply voltage and circumvent the random timing errors induced by the increased delay under iso-frequency/throughput. Our experiments have indicated that when our framework is applied on fast Fourier transform (FFT) and discrete cosine transform (DCT) architectures, it results in up to 34.07% power savings, when compared to conventional multiplierless architectures, while it induces minimal signal-to-noise ratio (SNR) degradation, even when voltage is reduced by up to 20%.

引用

页码：219 / 230

页数：12