HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference

被引:31
作者
Dong, Zhen [1 ]
Gao, Yizhao [2 ]
Huang, Qijing [1 ]
Wawrzynek, John [1 ]
So, Hayden K. H. [2 ]
Keutzer, Kurt [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Hong Kong, Hong Kong, Peoples R China
来源
2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021) | 2021年
关键词
D O I
10.1109/FCCM51124.2021.00014
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to the intractable search space of neural network architectures and hardware accelerator implementation. Differing from existing hardware-aware neural architecture search (NAS) algorithms that rely solely on the expensive learning-based approaches, our work incorporates integer programming into the search algorithm to prune the design space. Given a set of hardware resource constraints, our integer programming formulation directly outputs the optimal accelerator configuration for mapping a DNN subgraph that minimizes latency. We use an accuracy predictor for different DNN subgraphs with different quantization schemes and generate accuracy-latency pareto frontiers. With low computational cost, our algorithm can generate quantized networks that achieve state-of-the-art accuracy and hardware performance on Xilinx Zynq (ZU3EG) FPGA for image classification on ImageNet dataset. The solution searched by our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet [37] and 135% faster than FBNet [43] with comparable accuracy.
引用
收藏
页码:50 / 59
页数:10
相关论文
共 55 条
[1]  
Abdelfattah Mohamed S., 2020, FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, DOI 10.1145/3373087.3375334
[2]  
Abdelfattah M. S., 2020, ARXIV PREPRINT ARXIV
[3]  
[Anonymous], 2019, ARXIV190605910
[4]  
[Anonymous], 2018, P AAAI
[5]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[6]   FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks [J].
Blott, Michaela ;
Preusser, Thomas B. ;
Fraser, Nicholas J. ;
Gambardella, Giulio ;
O'Brien, Kenneth ;
Umuroglu, Yaman ;
Leeser, Miriam ;
Vissers, Kees .
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (03)
[7]   ZeroQ: A Novel Zero Shot Quantization Framework [J].
Cai, Yaohui ;
Yao, Zhewei ;
Dong, Zhen ;
Gholami, Amir ;
Mahoney, Michael W. ;
Keutzer, Kurt .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13166-13175
[8]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[9]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[10]  
Courbariaux Matthieu, 2016, Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1