A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit

被引：36

作者：

Hoang, Tung Thanh ^{[1
]}

Sjalander, Magnus ^{[1
]}

Larsson-Edefors, Per ^{[1
]}

机构：

[1] Chalmers Univ Technol, Dept Comp Sci & Engn, SE-41296 Gothenburg, Sweden

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2010年 / 57卷 / 12期

关键词：

Arithmetic circuits; energy efficient; high speed; multiply-accumulate unit; variable wordlength; DESIGN;

D O I：

10.1109/TCSI.2010.2091191

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We propose a high-speed and energy-efficient two-cycle multiply-accumulate (MAC) architecture that supports two's complement numbers, and includes accumulation guard bits and saturation circuitry. The first MAC pipeline stage contains only partial-product generation circuitry and a reduction tree, while the second stage, thanks to a special sign-extension solution, implements all other functionality. Place-and-route evaluations using a 65-nm 1.1-V cell library show that the proposed architecture offers a 31% improvement in speed and a 32% reduction in energy per operation, averaged across operand sizes of 16, 32, 48, and 64 bits, over a reference two-cycle MAC architecture that employs a multiplier in the first stage and an accumulator in the second. When operating the proposed architecture at the lower frequency of the reference architecture the available timing slack can be used to downsize gates, resulting in a 52% reduction in energy compared to the reference. We extend the new architecture to create a versatile double-throughput MAC (DTMAC) unit that efficiently performs either multiply-accumulate or multiply operations for N-bit, 1 x N/2-bit, or 2 x N/2-bit operands. In comparison to a fixed-function 32-bit MAC unit, 16-bit multiply-accumulate operations can be executed with 67% higher energy efficiency on a 32-bit DTMAC unit.

引用

页码：3073 / 3081

页数：9

共 25 条

[1] High speed and area-efficient multiply accumulate (MAC) unit for digital signal prossing applications [J].

Abdelgawad, A. ;

Bayoumi, Magdy .

2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, :3199-3202

[2]

[Anonymous], 1960, IRE Trans. Electron. Comput. EC, DOI DOI 10.1109/TEC.1960.5219822

[3] 2 COMPLEMENT PARALLEL ARRAY MULTIPLICATION ALGORITHM [J].

BAUGH, CR .

IEEE TRANSACTIONS ON COMPUTERS, 1973, C 22 (12) :1045-1047

[4] Dynamically exploiting narrow width operands to improve processor power and performance [J].

Brooks, D ;

Martonosi, M .

FIFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1999, :13-22

[5]

Ercegovac M. D., 2003, DIGITAL ARITHMETIC

[6] Multiplier reduction tree with logarithmic logic depth and regular connectivity [J].

Eriksson, H. ;

Larsson-Edefors, P. ;

Sheeran, M. ;

Sjalander, M. ;

Johansson, D. ;

Scholin, M. .

2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, :5-+

[7]

GROSSSCHADL J, 2008, P IEEE INT C EL CIRC, P739

[8] A 70-MHZ 8-BIT X 8-BIT PARALLEL PIPELINED MULTIPLIER IN 2.5-MU-M CMOS [J].

HATAMIAN, M ;

CASH, GL .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1986, 21 (04) :505-513

[9]

HOANG TT, 2009, IEEE INT S PAR DISTR

[10]

Hoang TT, 2009, IEEE INT SOC CONF, P119, DOI 10.1109/SOCCON.2009.5398079

← 1 2 3 →