Floating-point fused multiply-add: Reduced latency for floating-point addition

被引:28
作者
Bruguera, JD [1 ]
Lang, T [1 ]
机构
[1] Univ Santiago de Compostela, Dept Elect & Comp Engn, Santiago De Compostela 15706, Spain
来源
17TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS | 2005年
关键词
D O I
10.1109/ARITH.2005.22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose an architecture for the computation of the double-precision floating-point multiply-add fused (MAF) operation A + (B x C) that permits to compute the floating-point addition with lower latency than floating-point multiplication and MAF. While previous MAF architectures compute the three operations with the same latency, the proposed architecture permits to skip the first pipeline stages, those related with the multiplication B x C, in case of an addition. For instance, for a MAF unit pipelined into three or five stages, the latency of the floating-point addition is reduced to two or three cycles, respectively. To achieve the latency reduction for floating-point addition, the alignment shifter, which in previous organizations is in parallel with the multiplication, is moved so that the multiplication can be bypassed. To avoid that this modification increases the critical path, a doable-datapath organization is used, in which the alignment and normalization are in separate paths. Moreover, we use the techniques developed previously of combining the addition and the rounding and of performing the normalization before the addition.
引用
收藏
页码:42 / 51
页数:10
相关论文
共 16 条
[1]  
BRUGUERA JD, 2004, FLOATING POINT MULTI
[2]   Architectural design of a fast floating-point multiplication-add fused unit using signed-digit addition [J].
Chen, CY ;
Chen, LA ;
Cheng, JR .
EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, :346-353
[3]  
DIBRINO M, 2003, Patent No. 6542915
[4]   A comparison of three rounding algorithms for IEEE floating-point multiplication [J].
Even, G ;
Seidel, PM .
IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (07) :638-650
[5]   The IBM eServer z990 floating-point unit [J].
Gerwig, G ;
Wetter, H ;
Schwarz, EM ;
Haess, J ;
Krygowski, CA ;
Fleischer, BM ;
Kroener, M .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2004, 48 (3-4) :311-322
[6]   A dual floating point coprocessor with an FMAC architecture [J].
Heikes, C ;
ColonBonet, G .
1996 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE, DIGEST OF TECHNICAL PAPERS, 1996, 39 :354-355
[7]   Comparison of single- and dual-pass multiply-add fused floating-point units [J].
Jessani, RM ;
Putrino, M .
IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (09) :927-937
[8]   Floating-point multiply-add-fused with reduced latency [J].
Lang, T ;
Bruguera, JD .
IEEE TRANSACTIONS ON COMPUTERS, 2004, 53 (08) :988-1003
[9]   DESIGN OF THE IBM RISC SYSTEM-6000 FLOATING-POINT EXECUTION UNIT [J].
MONTOYE, RK ;
HOKENEK, E ;
RUNYON, SL .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1990, 34 (01) :59-70
[10]   1-GHz DAL SPARC64® dual floating point unit with RAS features [J].
Naini, A ;
Dhablania, A ;
James, W ;
Das Sarma, D .
ARITH-15 2001: 15TH SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2001, :173-183