BlockQNN: Efficient Block-Wise Neural Network Architecture Generation

被引:74
作者
Zhong, Zhao [1 ]
Yang, Zichen [2 ]
Deng, Boyang [2 ]
Yan, Junjie [2 ]
Wu, Wei [2 ]
Shao, Jing [2 ]
Liu, Cheng-Lin [3 ,4 ]
机构
[1] Univ Chinese Acad Sci, Inst Automat, Chinese Acad Sci, NLPR, Beijing 100190, Peoples R China
[2] Sensetime Res Inst, SenseTime Grp Ltd, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer architecture; Task analysis; Neural networks; Network architecture; Graphics processing units; Acceleration; Indexes; Convolutional neural network; neural architecture search; AutoML; reinforcement learning; Q-learning;
D O I
10.1109/TPAMI.2020.2969193
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural networks have gained a remarkable success in computer vision. However, most popular network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained to choose component layers sequentially. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2.35 percent top-1 error rate on CIFAR-10. (2) it offers tremendous reduction of the search space in designing networks, spending only 3 days with 32 GPUs. A faster version can yield a comparable result with only 1 GPU in 20 hours. (3) it has strong generalizability in that the network built on CIFAR also performs well on the larger-scale dataset. The best network achieves very competitive accuracy of 82.0 percent top-1 and 96.0 percent top-5 on ImageNet.
引用
收藏
页码:2314 / 2328
页数:15
相关论文
共 73 条
  • [1] Andrychowicz M, 2016, ADV NEUR IN, V29
  • [2] [Anonymous], 2015, 3 INT C LEARN REPR
  • [3] Baker B., 2017, P 6 INT C LEARN REPR
  • [4] Bender G, 2018, PR MACH LEARN RES, V80
  • [5] Bergstra J., 2011, ADV NEURAL INFORM PR, V24, P1
  • [6] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [7] Brock Andrew, 2017, P INT C LEARN REPR
  • [8] Cai H., 2019, ICLR, P1, DOI DOI 10.48550/ARXIV.1812.00332
  • [9] Cai H, 2018, PR MACH LEARN RES, V80
  • [10] Cai H, 2018, AAAI CONF ARTIF INTE, P2787