BlockQNN: Efficient Block-Wise Neural Network Architecture Generation

被引：74

作者：

Zhong, Zhao ^{[1
]}

Yang, Zichen ^{[2
]}

Deng, Boyang ^{[2
]}

Yan, Junjie ^{[2
]}

Wu, Wei ^{[2
]}

Shao, Jing ^{[2
]}

Liu, Cheng-Lin ^{[3
,4
]}

机构：

[1] Univ Chinese Acad Sci, Inst Automat, Chinese Acad Sci, NLPR, Beijing 100190, Peoples R China

[2] Sensetime Res Inst, SenseTime Grp Ltd, Beijing, Peoples R China

[3] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China

[4] Univ Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2021年 / 43卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Computer architecture; Task analysis; Neural networks; Network architecture; Graphics processing units; Acceleration; Indexes; Convolutional neural network; neural architecture search; AutoML; reinforcement learning; Q-learning;

D O I：

10.1109/TPAMI.2020.2969193

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional neural networks have gained a remarkable success in computer vision. However, most popular network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained to choose component layers sequentially. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2.35 percent top-1 error rate on CIFAR-10. (2) it offers tremendous reduction of the search space in designing networks, spending only 3 days with 32 GPUs. A faster version can yield a comparable result with only 1 GPU in 20 hours. (3) it has strong generalizability in that the network built on CIFAR also performs well on the larger-scale dataset. The best network achieves very competitive accuracy of 82.0 percent top-1 and 96.0 percent top-5 on ImageNet.

引用

页码：2314 / 2328

页数：15

共 73 条

[1] Andrychowicz M, 2016, ADV NEUR IN, V29
[2] [Anonymous], 2015, 3 INT C LEARN REPR
[3] Baker B., 2017, P 6 INT C LEARN REPR
[4] Bender G, 2018, PR MACH LEARN RES, V80
[5] Bergstra J., 2011, ADV NEURAL INFORM PR, V24, P1
[6] Fully-Convolutional Siamese Networks for Object Tracking
Bertinetto, Luca
Valmadre, Jack
Henriques, Joao F.
Vedaldi, Andrea
Torr, Philip H. S.
[J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
[7] Brock Andrew, 2017, P INT C LEARN REPR
[8] Cai H., 2019, ICLR, P1, DOI DOI 10.48550/ARXIV.1812.00332
[9] Cai H, 2018, PR MACH LEARN RES, V80
[10] Cai H, 2018, AAAI CONF ARTIF INTE, P2787

← 1 2 3 4 5 6 7 8 →