Enabling Scientific Computing on Memristive Accelerators

被引：79

作者：

Feinberg, Ben ^{[1
]}

Vengalam, Uday Kumar Reddy ^{[1
]}

Whitehair, Nathan ^{[1
]}

Wang, Shibo ^{[2
]}

Ipek, Engin ^{[1
,2
]}

机构：

[1] Univ Rochester, Dept Elect & Comp Engn, 601 Elmwood Ave, Rochester, NY 14627 USA

[2] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA

来源：

2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA) | 2018年

关键词：

Accelerator Architectures; Resistive RAM;

D O I：

10.1109/ISCA.2018.00039

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Linear algebra is ubiquitous across virtually every field of science and engineering, from climate modeling to macroeconomics. This ubiquity makes linear algebra a prime candidate for hardware acceleration, which can improve both the run time and the energy efficiency of a wide range of scientific applications. Recent work on memristive hardware accelerators shows significant potential to speed up matrix-vector multiplication (MVM), a critical linear algebra kernel at the heart of neural network inference tasks. Regrettably, the proposed hardware is constrained to a narrow range of workloads: although the eight- to 16-bit computations afforded by memristive MVM accelerators are acceptable for machine learning, they are insufficient for scientific computing where high-precision floating point is the norm. This paper presents the first proposal to enable scientific computing on memristive crossbars. Three techniques are explored reducing overheads by exploiting exponent range locality, early termination of fixed-point computation, and static operation scheduling that together enable a fixed-point memristive accelerator to perform high-precision floating point without the exorbitant cost of na ve floating-point emulation on fixed-point hardware. A heterogeneous collection of crossbars with varying sizes is proposed to efficiently handle sparse matrices, and an algorithm for mapping the dense subblocks of a sparse matrix to an appropriate set of crossbars is investigated. The accelerator can be combined with existing GPU-based systems to handle datasets that cannot be efficiently handled by the memristive accelerator alone. The proposed optimizations permit the memristive MVM concept to be applied to a wide range of problem domains, respectively improving the execution time and energy dissipation of sparse linear solvers by 10.3x and 10.9x over a purely GPU-based system.

引用

页码：367 / 382

页数：16

共 56 条

[1] High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm [J].

Alibart, Fabien ;

Gao, Ligang ;

Hoskins, Brian D. ;

Strukov, Dmitri B. .

NANOTECHNOLOGY, 2012, 23 (07)

[2]

Allis L.V., 1994, Searching for solutions in games and artificial intelligence

[3]

[Anonymous], 2013, P 40 ANN INT S COMP

[4]

[Anonymous], 2019, IEEE std 754-2019 (revision of IEEE 754-2008), P1, DOI [DOI 10.1109/IEEESTD.2019.8766229, 10.1109/IEEESTD.2019.8766229, 10.1109/IEEESTD.2008.4610935, DOI 10.1109/IEEESTD.2008.4610935]

[5]

[Anonymous], INT S HIGH PERF COMP

[6]

[Anonymous], INT S HIGH PERF COMP

[7]

[Anonymous], Synopsys design compiler user guide

[8]

[Anonymous], 1952, METHODS CONJUGATE GR

[9]

[Anonymous], 2004, Kernel methods in computational biology

[10]

Anzt H., 2016, INT PAR DISTR PROC S

← 1 2 3 4 5 6 →