A3: Accelerating Attention Mechanisms in Neural Networks with Approximation

被引：185

作者：

Ham, Tae Jun ^{[1
]}

Jung, Sung Jun ^{[1
]}

Kim, Seonghak ^{[1
]}

Oh, Young H. ^{[2
]}

Park, Yeonhong ^{[1
]}

Song, Yoonho ^{[1
]}

Park, Jung-Hun ^{[1
]}

Lee, Sanghee ^{[1
]}

Park, Kyoung ^{[3
]}

Lee, Jae W. ^{[1
]}

Jeong, Deog-Kyoon ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

[2] Sungkyunkwan Univ, Seoul, South Korea

[3] SK Hynix, Ichon, South Korea

来源：

2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020) | 2020年

基金：

新加坡国家研究基金会;

关键词：

Attention Mechanism; Accelerators; Approximation; Neural Networks; Machine Learning; ASIC;

D O I：

10.1109/HPCA47549.2020.00035

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increasing computational demands of neural networks, many hardware accelerators for the neural networks have been proposed. Such existing neural network accelerators often focus on popular neural network types such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs); however, not much attention has been paid to attention mechanisms, an emerging neural network primitive that enables neural networks to retrieve most relevant information from a knowledge-base, external memory, or past states. The attention mechanism is widely adopted by many state-of-the-art neural networks for computer vision, natural language processing, and machine translation, and accounts for a large portion of total execution time. We observe today's practice of implementing this mechanism using matrix-vector multiplication is suboptimal as the attention mechanism is semantically a content-based search where a large portion of computations ends up not being used. Based on this observation, we design and architect A(3), which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization. Our proposed accelerator achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware.

引用

页码：328 / 341

页数：14

共 71 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

[Anonymous], 2016, ACM SIGDA INT S FIEL

[3]

[Anonymous], INT C ARCH SUPP PROG

[4]

[Anonymous], 2018, ACM SIGDA INT S FIEL

[5]

[Anonymous], C COMP VIS PATT REC

[6]

[Anonymous], INT S COMP ARCH

[7]

[Anonymous], 2016, CoRR

[8]

[Anonymous], 2017, INT C LEARN REPR

[9]

[Anonymous], INT C NEUR INF PROC

[10]

[Anonymous], 2016, EMPIRICAL METHODS NA

← 1 2 3 4 5 6 7 8 →