The Floating-Point Unit of the Jaguar x86 Core

被引：16

作者：

Rupley, Jeff ^{[1
]}

King, John ^{[1
]}

Quinnell, Eric ^{[1
]}

Galloway, Frank ^{[1
]}

Patton, Ken ^{[1
]}

Seidel, Peter-Michael ^{[1
]}

Dinh, James ^{[1
]}

Bui, Hai ^{[1
]}

Bhowmik, Anasua ^{[1
]}

机构：

[1] AMD Austin & Bangalore, Bangalore, Karnataka, India

来源：

2013 21ST IEEE SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH) | 2013年

关键词：

AMD Jaguar; floating-point unit; x87; SSE; AVX; MMX; AES; CLMUL; F16C; industry implementation;

D O I：

10.1109/ARITH.2013.24

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The AMD Jaguar x86 core uses a fully-synthesized, 128-bit native floating-point unit (FPU) built as a co-processor model. The Jaguar FPU supports several x86 ISA extensions, including x87, MMX, SSE1 through SSE4.2, AES, CLMUL, AVX, and F16C instruction sets. The front end of the unit decodes two complex operations per cycle and uses a dedicated renamer (RN), free list (FL), and retire queue (RQ) for in-order dispatch and retire. The FPU issues to the execution units with a dedicated out-of-order, dual-issue scheduler. Execution units source operands from a synthesized physical register file (PRF) and bypass network. The back end of the unit has two execution pipes: the first pipe contains a vector integer ALU, a vector integer MUL unit, and a floating-point adder (FPA); the second pipe contains a vector integer ALU, a store-convert unit, and a floating-point iterative multiplier (FPM). The implementation of the unit focused on low-power design and on vectorized single-precision (SP) performance optimizations. The verification of the unit required complex pseudo-random and formal verification techniques. The Jaguar FPU is built in a 28nm CMOS process.

引用

页码：7 / 16

页数：10

共 12 条

[1]

Advanced Micro Devices, 2012, AMD64 ARCH PROGR MAN, V1-5

[2]

AMD, 2013, BIOS KERN DEV GUID B

[3]

ANSI and IEEE, 1985, IEEE754 ANSI IEEE

[4] BOBCAT: AMD'S LOW-POWER X86 PROCESSOR [J].

Burgess, Brad ;

Cohen, Brad ;

Denman, Marvin ;

Dundas, Jim ;

Kaplan, David ;

Rupley, Jeff .

IEEE MICRO, 2011, 31 (02) :16-25

[5]

Goldschmidt R. E., 1964, Ph.D. dissertation

[6] Floating point division and square root algorithms and implementation in the AMD-K7™ microprocessor [J].

Oberman, SF .

14TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 1999, :106-115

[7]

Rupley J., 2012, HOT CHIPS

[8]

Sassone P., 2007, P ISCA 34 JUN

[9]

Scherer A., 1999, IEEE INT SOL STAT CI

[10]

Seidel P., 2011, P ACL2 2011, P70

← 1 2 →