Low-latency High-throughput Multi-precision Fused Floating-point Division and Square-root Unit Design

被引：0

作者：

Dai, Liangtao ^{[1
]}

Zhu, Haocheng ^{[1
]}

Yuan, Binzhe ^{[1
]}

Yang, Chao ^{[1
]}

Wang, Yuan ^{[2
]}

Lou, Xin ^{[1
]}

机构：

[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

[2] UESTC, Sch Integrated Circuit Sci & Engn, Chengdu, Peoples R China

来源：

2024 INTERNATIONAL VLSI SYMPOSIUM ON TECHNOLOGY, SYSTEMS AND APPLICATIONS, VLSI TSA | 2024年

关键词：

Goldschmidt algorithm; floating-point unit (FPU); square-root; division;

D O I：

10.1109/VLSITSA60681.2024.10546355

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a novel design for a low-latency, high-throughput fused floating-point unit (FPU) handling division (DIV) and square-root (SQRT) operations based on the Goldschmidt algorithm. Traditional FPUs in commercial processors suffer from long latency, low throughput, and substantial hardware consumption due to the complexity of DIV and SQRT. In our design, we employ an innovative error analysis method to reduce multiplier bitwidths. Moreover, we elaborately integrate DIV and SQRT to improve resource reuse. Additionally, the pipeline structure ensures multi-precision support and high throughput. We conduct 100 trillion random tests to validate our design, demonstrating its compliance with IEEE 754 single-precision (SP) and double-precision (DP) standards. Results show that our design not only excels existing FPUs in performance but also achieves significant resource reuse for DIV and SQRT operations.

引用

页数：4

共 9 条

[1] Radix-64 Floating-Point Division and Square Root: Iterative and Pipelined Units [J].

Bruguera, Javier D. .

IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (10) :2990-3001

[2] Error Analysis for Fused Floating-point Square-root and Division based on Goldschmidt Algorithm [J].

Dai, Liangtao ;

Yuan, Binzhe ;

Wang, Yuan ;

Yang, Chao ;

Lou, Xin .

2023 21ST IEEE INTERREGIONAL NEWCAS CONFERENCE, NEWCAS, 2023,

[3] A Rounding Method to Reduce the Required Multiplier Precision for Goldschmidt Division [J].

Kong, Inwook ;

Swartzlander, Earl E., Jr. .

IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (12) :1703-1708

[4] Quad Precision Floating Point on the IBM z13™ [J].

Lichtenau, Cedric ;

Carlough, Steven ;

Mueller, Silvia Melitta .

2016 IEEE 23ND SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2016, :87-94

[5] High-Throughput Low-Latency Pipelined Divider for Single-Precision Floating-Point Numbers [J].

Lyu, Fei ;

Xia, Yan ;

Chen, Yuheng ;

Wang, Yanxu ;

Luo, Yuanyong ;

Wang, Yu .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (04) :544-548

[6] The Floating-Point Unit of the Jaguar x86 Core [J].

Rupley, Jeff ;

King, John ;

Quinnell, Eric ;

Galloway, Frank ;

Patton, Ken ;

Seidel, Peter-Michael ;

Dinh, James ;

Bui, Hai ;

Bhowmik, Anasua .

2013 21ST IEEE SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2013, :7-16

[7]

SARMA DD, 1995, PROCEEDINGS OF THE 12TH SYMPOSIUM ON COMPUTER ARITHMETIC, P17, DOI 10.1109/ARITH.1995.465381

[8] Approximating elementary functions with symmetric bipartite tables [J].

Schulte, MJ ;

Stine, JE .

IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (08) :842-847

[9]

Taek-Jun Kwon, 2008, 2008 15th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2008), P702, DOI 10.1109/ICECS.2008.4674950

← 1 →