NASA plus : Neural Architecture Search and Acceleration for Multiplication-Reduced Hybrid Networks

被引：3

作者：

Shi, Huihong ^{[1
,2
]}

You, Haoran ^{[3
]}

Wang, Zhongfeng ^{[4
]}

Lin, Yingyan ^{[3
]}

机构：

[1] Georgia Inst Technol, Atlanta, GA 30332 USA

[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Peoples R China

[3] Georgia Inst Technol, Sch Comp Sci, Atlanta, GA 30332 USA

[4] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2023年 / 70卷 / 06期

关键词：

Multiplication-reduced hybrid networks; neural architecture search; chunk-based accelerator; reconfigurable PE; algorithm-hardware co-design;

D O I：

10.1109/TCSI.2023.3256700

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to their multiplication-based counterparts in task accuracy, calling for multiplication-reduced hybrid DNNs to marry the best of both worlds. To this end, we propose a Neural Architecture Search and Acceleration (NASA) framework for the above hybrid models, dubbed NASA+, to boost both task accuracy and hardware efficiency. Specifically, NASA+ augments the state-of-the-art (SOTA) search space with multiplication-free operators to construct hybrid ones, and then adopts a novel progressive pretraining strategy to enable the effective search. Furthermore, NASA+ develops a chunk-based accelerator with novel reconfigurable processing elements to better support searched hybrid models, and integrates an auto-mapper to search for optimal dataflows. Experimental results and ablation studies consistently validate the effectiveness of our NASA+ algorithm-hardware co-design framework, e.g., we can achieve up to 65.1% lower energy-delay-product with comparable accuracy over the SOTA multiplication-based system on CIFAR100. Codes are available at https://github.com/GATECH-EIC/NASA.

引用

页码：2523 / 2536

页数：14

共 42 条

[1] Alwani M, 2016, INT SYMP MICROARCH
[2] Banner R., 2018, Advances in Neural Information Processing Systems, P5151
[3] An Energy-Efficient Domain-Specific Reconfigurable Array Processor With Heterogeneous PEs for Wearable Brain-Computer Interface SoCs
Byun, Wooseok
Je, Minkyu
Kim, Ji-Hoon
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (12) : 4872 - 4885
[4] Cai L., 2018, ARXIV
[5] AdderNet: Do We Really Need Multiplications in Deep Learning?
Chen, Hanting
Wang, Yunhe
Xu, Chunjing
Shi, Boxin
Xu, Chao
Tian, Qi
Xu, Chang
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1465 - 1474
[6] Chen T., 2018, ARXIV
[7] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Chen, Yu-Hsin
Emer, Joel
Sze, Vivienne
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 367 - 379
[8] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
[9] FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining
Dai, Xiaoliang
Wan, Alvin
Zhang, Peizhao
Wu, Bichen
He, Zijian
Wei, Zhen
Chen, Kan
Tian, Yuandong
Yu, Matthew
Vajda, Peter
Gonzalez, Joseph E.
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16271 - 16280
[10] A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things
Du, Li
Du, Yuan
Li, Yilei
Su, Junjie
Kuan, Yen-Cheng
Liu, Chun-Chen
Chang, Mau-Chung Frank
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (01) : 198 - 208

← 1 2 3 4 5 →