Automated Hardware and Neural Network Architecture co-design of FPGA accelerators using multi-objective Neural Architecture Search

被引：0

作者：

Colangelo, Philip ^{[1
]}

Segal, Oren ^{[2
]}

Speicher, Alex ^{[2
]}

Margala, Martin ^{[3
]}

机构：

[1] Intel PSG, San Jose, CA 95134 USA

[2] Hofstra Univ, Hempstead, NY 11550 USA

[3] Univ Massachusetts Lowell, Lowell, MA USA

来源：

2020 IEEE 10TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE-BERLIN) | 2020年

关键词：

Evolutionary Algorithms; Machine Learning; FPGA; Automated Design;

D O I：

10.1109/ICCE-Berlin50680.2020.9352153

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutoML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research in the auto-design of NNAs has focused on convolution networks and image recognition, ignoring the fact that a significant part of the workload in data centers is general-purpose deep neural networks. In this work, we develop and test a general multilayer perceptron (MLP) flow that can take arbitrary datasets as input and automatically produce optimized NNAs and hardware designs. We test the flow on six benchmarks. Our results show we exceed the performance of currently published MLP accuracy results and are competitive with non-MLP based results. We compare general and common GPU architectures with our scalable FPGA design and show we can achieve higher efficiency and higher throughput (outputs per second) for the majority of datasets. Further insights into the design space for both accurate networks and high performing hardware shows the power of co-design by correlating accuracy versus throughput, network size versus accuracy, and scaling to high-performance devices.

引用

页数：6

共 20 条

[1]

Canziani A., 2016, ARXIV PREPRINT ARXIV

[2]

Colangelo P, 2019, IEEE HIGH PERF EXTR, DOI 10.1109/hpec.2019.8916533

[3]

Elsken T., 2018, ARXIV PREPRINT ARXIV

[4]

Fernando C., 2017, ARXIV PREPRINT ARXIV

[5]

GOLD D, 1991, CONFERENCE RECORD OF THE TWENTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, P69, DOI 10.1109/ACSSC.1991.186416

[6] Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective [J].

Hazelwood, Kim ;

Bird, Sarah ;

Brooks, David ;

Chintala, Soumith ;

Diril, Utku ;

Dzhulgakov, Dmytro ;

Fawzy, Mohamed ;

Jia, Bill ;

Jia, Yangqing ;

Kalro, Aditya ;

Law, James ;

Lee, Kevin ;

Lu, Jason ;

Noordhuis, Pieter ;

Smelyanskiy, Misha ;

Xiong, Liang ;

Wang, Xiaodong .

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, :620-629

[7] Motivation for and Evaluation of the First Tensor Processing Unit [J].

Jouppi, Norman P. ;

Young, Cliff ;

Patil, Nishant ;

Patterson, David .

IEEE MICRO, 2018, 38 (03) :10-19

[8]

Kohavi R., 1995, IJCAI-95. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, P1137

[9]

LeCun Y., 2010, MNIST handwritten digit database, V2

[10]

MILLER GF, 1989, PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON GENETIC ALGORITHMS, P379

← 1 2 →