Differentiable neural architecture learning for efficient neural networks

被引:30
作者
Guo, Qingbei [1 ,2 ]
Wu, Xiao-Jun [1 ]
Kittler, Josef [3 ]
Feng, Zhiquan [2 ]
机构
[1] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
[2] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250022, Peoples R China
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
英国工程与自然科学研究理事会; 中国国家自然科学基金;
关键词
Deep neural network; Convolutional neural network; Neural architecture search; Automated machine learning;
D O I
10.1016/j.patcog.2021.108448
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient neural networks has received ever-increasing attention with the evolution of convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms. One of the biggest problems to obtaining such efficient neural networks is efficiency, even recent differentiable neural architecture search (DNAS) requires to sample a small number of candidate neural architectures for the selection of the optimal neural architecture. To address this computational efficiency issue, we introduce a novel architecture parameterization based on scaled sigmoid function , and propose a general Differentiable Neural Architecture Learning (DNAL) method to obtain efficient neural networks without the need to evaluate candidate neural networks. Specifically, for stochastic supernets as well as conventional CNNs, we build a new channel-wise module layer with the architecture components controlled by a scaled sigmoid function. We train these neural network models from scratch. The network optimization is decoupled into the weight optimization and the architecture optimization, which avoids the interaction between the two types of parameters and alleviates the vanishing gradient problem. We address the non-convex optimization problem of efficient neural networks by the continuous scaled sigmoid method instead of the common softmax method. Extensive experiments demonstrate our DNAL method delivers superior performance in terms of efficiency, and adapts to conventional CNNs (e.g., VGG16 and ResNet50), lightweight CNNs (e.g., MobileNetV2) and stochastic supernets (e.g., ProxylessNAS). The optimal neural networks learned by DNAL surpass those produced by the state-of-the-art methods on the benchmark CIFAR-10 and ImageNet-1K dataset in accuracy, model size and computational complexity. Our source code is available at https://github.com/QingbeiGuo/DNAL.git . (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 52 条
[1]  
Allgower E.L., 2012, Numerical continuation methods: an introduction, V13
[2]  
Andrew GHoward., 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
[3]  
Baker Bowen., 2017, INT C LEARNING REPRE
[4]  
Bello I, 2017, PR MACH LEARN RES, V70
[5]  
Cai H., 2019, INT C LEARN REPR
[6]   HashNet: Deep Learning to Hash by Continuation [J].
Cao, Zhangjie ;
Long, Mingsheng ;
Wang, Jianmin ;
Yu, Philip S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5609-5618
[7]  
Courbariaux M, 2015, ADV NEUR IN, V28
[8]   CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification [J].
Fu, Chaoyou ;
Hu, Yibo ;
Wu, Xiang ;
Shi, Hailin ;
Mei, Tao ;
He, Ran .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11803-11812
[9]   Self-grouping convolutional neural networks [J].
Guo, Qingbei ;
Wu, Xiao-Jun ;
Kittler, Josef ;
Feng, Zhiquan .
NEURAL NETWORKS, 2020, 132 :491-505
[10]  
He K., 2017, IEEE INT C COMPUT VI, P2961, DOI [10.1109/iccv.201, DOI 10.1109/ICCV.2017.322]