MAHASIM: Machine-Learning Hardware Acceleration Using a Software-Defined Intelligent Memory System

被引：0

作者：

Bahar Asgari

Saibal Mukhopadhyay

Sudhakar Yalamanchili

机构：

[1] Georgia Institute of Technology,School of Electrical and Computer Engineering

来源：

Journal of Signal Processing Systems | 2021年 / 93卷

关键词：

Machine learning; Neural networks; Near-data-processing; Memory system;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

As computations in machine-learning applications are increasing simultaneously along the size of datasets, the energy and performance costs of data movement dominate that of compute. This issue is more pronounced in embedded systems with limited resources and energy. Although near-data-processing (NDP) is pursued as an architectural solution, comparatively less attention has been focused on how to scale NDP for larger-scale embedded machine learning applications (e.g., speech and motion processing). We propose machine-learning hardware acceleration using a software-defined intelligent memory system (Mahasim). Mahasim is a scalable NDP-based memory system, in which application performance scales with the size of data. The building blocks of Mahasim are the programable memory slices, supported by data partitioning, compute-aware memory allocation, and an independent in-memory execution model. For recurrent neural networks, Mahasim shows up to 537.95 GFLOPS/W energy efficiency and 3.9x speedup, when the size of the system increases from 2 to 256 memory slices, which indicates that Mahasim favors larger problems.

引用

页码：659 / 675

页数：16

共 39 条

[1]

Borkar S(2013)Role of interconnects in the future of computing Journal of Lightwave Technology 31 3927-3933

[2]

Chen YH(2017)Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks IEEE Journal of Solid-State Circuits 52 127-138

[3]

Krishna T(1995)Processing in memory: The terasys massively parallel pim array Computer 28 23-31

[4]

Emer JS(2018)Distributed perception by collaborative robots IEEE Robotics and Automation Letters 3 3709-3716

[5]

Sze V(1997)Long short-term memory Neural computation 9 1735-1780

[6]

Gokhale M(1978)Bit-slice microprocessors IETE Journal of Research 24 124-131

[7]

Holmes B(2011)Gpus and the future of parallel computing IEEE Micro 31 7-17

[8]

Iobst K(1984)A parallel pipelined relational query processor ACM Transactions on Database Systems (TODS) 9 214-235

[9]

Hadidi R(1982)Why systolic architectures? IEEE Computer 15 37-46

[10]

Cao J(2000)Smart memories: a modular reconfigurable architecture ACM SIGARCH Computer Architecture News 28 161-171

← 1 2 3 4 →