NASA plus : Neural Architecture Search and Acceleration for Multiplication-Reduced Hybrid Networks

被引:3
作者
Shi, Huihong [1 ,2 ]
You, Haoran [3 ]
Wang, Zhongfeng [4 ]
Lin, Yingyan [3 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Peoples R China
[3] Georgia Inst Technol, Sch Comp Sci, Atlanta, GA 30332 USA
[4] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Peoples R China
关键词
Multiplication-reduced hybrid networks; neural architecture search; chunk-based accelerator; reconfigurable PE; algorithm-hardware co-design;
D O I
10.1109/TCSI.2023.3256700
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to their multiplication-based counterparts in task accuracy, calling for multiplication-reduced hybrid DNNs to marry the best of both worlds. To this end, we propose a Neural Architecture Search and Acceleration (NASA) framework for the above hybrid models, dubbed NASA+, to boost both task accuracy and hardware efficiency. Specifically, NASA+ augments the state-of-the-art (SOTA) search space with multiplication-free operators to construct hybrid ones, and then adopts a novel progressive pretraining strategy to enable the effective search. Furthermore, NASA+ develops a chunk-based accelerator with novel reconfigurable processing elements to better support searched hybrid models, and integrates an auto-mapper to search for optimal dataflows. Experimental results and ablation studies consistently validate the effectiveness of our NASA+ algorithm-hardware co-design framework, e.g., we can achieve up to 65.1% lower energy-delay-product with comparable accuracy over the SOTA multiplication-based system on CIFAR100. Codes are available at https://github.com/GATECH-EIC/NASA.
引用
收藏
页码:2523 / 2536
页数:14
相关论文
共 42 条
  • [1] Alwani M, 2016, INT SYMP MICROARCH
  • [2] Banner R., 2018, Advances in Neural Information Processing Systems, P5151
  • [3] An Energy-Efficient Domain-Specific Reconfigurable Array Processor With Heterogeneous PEs for Wearable Brain-Computer Interface SoCs
    Byun, Wooseok
    Je, Minkyu
    Kim, Ji-Hoon
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (12) : 4872 - 4885
  • [4] Cai L., 2018, ARXIV
  • [5] AdderNet: Do We Really Need Multiplications in Deep Learning?
    Chen, Hanting
    Wang, Yunhe
    Xu, Chunjing
    Shi, Boxin
    Xu, Chao
    Tian, Qi
    Xu, Chang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1465 - 1474
  • [6] Chen T., 2018, ARXIV
  • [7] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
    Chen, Yu-Hsin
    Emer, Joel
    Sze, Vivienne
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 367 - 379
  • [8] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
  • [9] FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining
    Dai, Xiaoliang
    Wan, Alvin
    Zhang, Peizhao
    Wu, Bichen
    He, Zijian
    Wei, Zhen
    Chen, Kan
    Tian, Yuandong
    Yu, Matthew
    Vajda, Peter
    Gonzalez, Joseph E.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16271 - 16280
  • [10] A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things
    Du, Li
    Du, Yuan
    Li, Yilei
    Su, Junjie
    Kuan, Yen-Cheng
    Liu, Chun-Chen
    Chang, Mau-Chung Frank
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (01) : 198 - 208