Searching for A Robust Neural Architecture in Four GPU Hours

被引:455
作者
Dong, Xuanyi [1 ,2 ]
Yang, Yi [1 ]
机构
[1] Univ Technol Sydney, Sydney, NSW, Australia
[2] Baidu Res, Sunnyvale, CA USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82% with only 2.5M parameters, which is on par with the state-of-the-art.
引用
收藏
页码:1761 / 1770
页数:10
相关论文
共 47 条
[1]  
Baker Bowen, 2017, INTERNATIONAL CONFER
[2]  
Baker Bowen, 2018, INTERNATIONAL CONFER
[3]  
Bello I, 2017, PR MACH LEARN RES, V70
[4]  
Cai H, 2018, AAAI CONF ARTIF INTE, P2787
[5]  
Chen LJ, 2018, ADV NEUR IN, V31
[6]  
Cho K, 2014, ARXIV14061078
[7]  
DeVries Terrance, 2017, ARXIV170205538
[8]   Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors [J].
Dong, Xuanyi ;
Yu, Shoou-, I ;
Weng, Xinshuo ;
Wei, Shih-En ;
Yang, Yi ;
Sheikh, Yaser .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :360-368
[9]   More is Less: A More Complicated Network with Less Inference Complexity [J].
Dong, Xuanyi ;
Huang, Junshi ;
Yang, Yi ;
Yan, Shuicheng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1895-1903
[10]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]