New method for high performance multiply-accumulator design

被引:8
作者
Xia, Bing-jie [1 ]
Liu, Peng [1 ]
Yao, Qing-dong [1 ]
机构
[1] Zhejiang Univ, Dept Informat Sci & Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
来源
JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A | 2009年 / 10卷 / 07期
基金
中国国家自然科学基金;
关键词
Multiply-accumulator (MAC); Pipeline; Compressor; Partial product reduction tree (PPRT); Split structure; LOW-POWER; ADDER; ARCHITECTURE; TREE;
D O I
10.1631/jzus.A0820566
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This study presents a new method of 4-pipelined high-performance split multiply-accumulator (MAC) architecture, which is capable of supporting multiple precisions developed for media processors. To speed up the design further, a novel partial product compression circuit based on interleaved adders and a modified hybrid partial product reduction tree (PPRT) scheme are proposed. The MAC can perform 1-way 32-bit, 4-way 16-bit signed/unsigned multiply or multiply-accumulate operations and 2-way parallel multiply add (PMADD) operations at a high frequency of 1.25 GHz under worst-case conditions and 1.67 GHz under typical-case conditions, respectively. Compared with the MAC in 32-bit microprocessor without interlocked piped stages (MIPS), the proposed design shows a great advantage in speed. Moreover, an improvement of up to 32% in throughput is achieved. The MAC design has been fabricated with Taiwan Semiconductor Manufacturing Company (TSMC) 90-nm CMOS standard cell technology and has passed a functional test.
引用
收藏
页码:1067 / 1074
页数:8
相关论文
共 23 条
  • [1] High speed and area-efficient multiply accumulate (MAC) unit for digital signal prossing applications
    Abdelgawad, A.
    Bayoumi, Magdy
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 3199 - 3202
  • [2] A review of 0.18-μm full adder performances for tree structured arithmetic circuits
    Chang, CH
    Gu, JM
    Zhang, MY
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2005, 13 (06) : 686 - 695
  • [3] Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits
    Chang, CH
    Gu, JM
    Zhang, MY
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2004, 51 (10) : 1985 - 1997
  • [4] A low-power multiplier with the spurious power suppression technique
    Chen, Kuan-Hung
    Chu, Yuan-Sun
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2007, 15 (07) : 846 - 850
  • [5] Low energy 16-bit Booth leapfrog array multiplier using dynamic adders
    Chong, K.-S.
    Gwee, B.-H.
    Chang, J.-S.
    [J]. IET CIRCUITS DEVICES & SYSTEMS, 2007, 1 (02) : 170 - 174
  • [6] An embedded 32-b microprocessor core for low-power and high-performance applications
    Clark, LT
    Hoffman, EJ
    Miller, J
    Biyani, M
    Liao, YY
    Strazdus, S
    Morrow, M
    Velarde, KE
    Yarch, MA
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2001, 36 (11) : 1599 - 1608
  • [7] Architecture and implementation of a vector/SIMD multiply-accumulate unit
    Danysh, A
    Tan, D
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (03) : 284 - 293
  • [8] A fast parallel multiplier-accumulator using the modified Booth algorithm
    Elguibaly, F
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 2000, 47 (09): : 902 - 908
  • [9] Fang CJ, 2002, 2002 IEEE ASIA-PACIFIC CONFERENCE ON ASIC PROCEEDINGS, P25, DOI 10.1109/APASIC.2002.1031523
  • [10] 64-bit carry-select adder with reduced area
    Kim, Y
    Kim, LS
    [J]. ELECTRONICS LETTERS, 2001, 37 (10) : 614 - 615