Architectural modifications to enhance the floating-point performance of FPGAs

被引：24

作者：

Beauchamp, Michael J. ^{[1
]}

Hauck, Scott ^{[2
]}

Underwood, Keith D. ^{[3
]}

Hemmert, K. Scott ^{[3
]}

机构：

[1] MIPS Technol, Mountain View, CA 94043 USA

[2] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA

[3] Sandia Natl Labs, Albuquerque, NM 87185 USA

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2008年 / 16卷 / 02期

基金：

美国国家科学基金会;

关键词：

field-programmable gate array (FPGA); floating-point arithmetic; reconfigurable architecture;

D O I：

10.1109/TVLSI.2007.912041

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.

引用

页码：177 / 187

页数：11

共 26 条

[1]

[Anonymous], P ACM SIGDA 13 INT S

[2]

[Anonymous], 2004, PROC WMPI, DOI DOI 10.1145/1054943.1054946

[3]

[Anonymous], 2004, P 12 ACM INT S FIELD

[4]

*ANSI IEEE, 1985, 7541985 ANSI IEEE

[5]

Betz V., 1997, Field-programmable Logic and Applications. 7th International Workshop, FPL '97. Proceedings, P213

[6]

Betz V., 1999, ARCHITECTURE CAD DEE

[7] FLOWMAP - AN OPTIMAL TECHNOLOGY MAPPING ALGORITHM FOR DELAY OPTIMIZATION IN LOOKUP-TABLE BASED FPGA DESIGNS [J].

CONG, J ;

DING, YH .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1994, 13 (01) :1-12

[8]

Govindu G., 2004, Proceedings. 18th International Parallel and Distributed Processing Symposium

[9]

Hemmert KS, 2006, ANN IEEE SYM FIELD P, P349

[10]

Hemmert KS, 2005, ANN IEEE SYM FIELD P, P171

← 1 2 3 →