Searching for A Robust Neural Architecture in Four GPU Hours

被引：455

作者：

Dong, Xuanyi ^{[1
,2
]}

Yang, Yi ^{[1
]}

机构：

[1] Univ Technol Sydney, Sydney, NSW, Australia

[2] Baidu Res, Sunnyvale, CA USA

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00186

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82% with only 2.5M parameters, which is on par with the state-of-the-art.

引用

页码：1761 / 1770

页数：10

共 47 条

[1]

Baker Bowen, 2017, INTERNATIONAL CONFER

[2]

Baker Bowen, 2018, INTERNATIONAL CONFER

[3]

Bello I, 2017, PR MACH LEARN RES, V70

[4]

Cai H, 2018, AAAI CONF ARTIF INTE, P2787

[5]

Chen LJ, 2018, ADV NEUR IN, V31

[6]

Cho K, 2014, ARXIV14061078

[7]

DeVries Terrance, 2017, ARXIV170205538

[8] Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors [J].

Dong, Xuanyi ;

Yu, Shoou-, I ;

Weng, Xinshuo ;

Wei, Shih-En ;

Yang, Yi ;

Sheikh, Yaser .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :360-368

[9] More is Less: A More Complicated Network with Less Inference Complexity [J].

Dong, Xuanyi ;

Huang, Junshi ;

Yang, Yi ;

Yan, Shuicheng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1895-1903

[10]

Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

← 1 2 3 4 5 →