Scalable hardware-algorithms for binary prefix sums

被引:14
作者
Lin, R [1 ]
Nakano, K
Olariu, S
Pinotti, MC
Schwing, JL
Zomaya, AY
机构
[1] SUNY Coll Geneseo, Dept Comp Sci, Geneseo, NY 14454 USA
[2] Nagoya Inst Technol, Dept Elect & Comp Engn, Showa Ku, Nagoya, Aichi 4668555, Japan
[3] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[4] CNR, IEI, I-56100 Pisa, Italy
[5] Cent Washington Univ, Dept Comp Sci, Ellensburg, WA 98926 USA
[6] Univ Western Australia, Dept Elect & Elect Engn, Parallel Comp Res Lab, Perth, WA 6907, Australia
基金
美国国家航空航天局; 美国国家科学基金会;
关键词
hardware-algorithms; shift switching; binary prefix sums; binary counting; scalable architectures; pipelining;
D O I
10.1109/71.877941
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a omega(k)-bit, (k greater than or equal to 2), sequence using as basic building blocks linear arrays of at most omega(2) shift switches, where omega is a small power of 2. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most omega(2). We adopt a VLSI delay model where the "length" of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a omega(k)-bit binary sequence in the time of 2k - 2 broadcasts, while the corresponding prefix sums can be computed in the time of 3k - 4 broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most w?, the total number of broadcasts involved is less than three times the number required by an "ideal" design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a k omega(k)-bit binary sequence in the time of 3k + [log(omega) k] - 3 broadcasts. Using this design, the corresponding prefix sums can be computed in the time of 4k + [log(omega) k] - 5 broadcasts.
引用
收藏
页码:838 / 850
页数:13
相关论文
共 35 条
[1]  
Akl S.G., 1997, Parallel Computation: Models and Methods
[2]  
ALNUWEIRI H, 1994, P WORKSH REC ARCH AP, P1
[3]   SCANS AS PRIMITIVE PARALLEL OPERATIONS [J].
BLELLOCH, GE .
IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (11) :1526-1538
[4]   A REGULAR LAYOUT FOR PARALLEL ADDERS [J].
BRENT, RP ;
KUNG, HT .
IEEE TRANSACTIONS ON COMPUTERS, 1982, 31 (03) :260-264
[5]  
CAVANAUGH JJF, 1984, DIGITAL COMPUTER ARC
[6]   Pipelined adders [J].
Dadda, L ;
Piuri, V .
IEEE TRANSACTIONS ON COMPUTERS, 1996, 45 (03) :348-356
[7]  
HALSALL F, 1996, DATA COMMUNICATIONS
[8]  
Katz R. H., 1994, CONT LOGIC DESIGN
[9]   LOW-POWER DESIGN TECHNIQUES FOR HIGH-PERFORMANCE CMOS ADDERS [J].
KO, UM ;
BALSARA, PT ;
LEE, W .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 1995, 3 (02) :327-333
[10]   PARALLEL ALGORITHM FOR EFFICIENT SOLUTION OF A GENERAL CLASS OF RECURRENCE EQUATIONS [J].
KOGGE, PM ;
STONE, HS .
IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (08) :786-793