MNGNAS: Distilling Adaptive Combination of Multiple Searched Networks for One-Shot Neural Architecture Search

被引:84
作者
Chen, Zhihua [1 ]
Qiu, Guhao [1 ]
Li, Ping [2 ,3 ]
Zhu, Lei [4 ,5 ]
Yang, Xiaokang [6 ]
Sheng, Bin [7 ]
机构
[1] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Sch Design, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol Guangzhou, ROAS Thrust, Guangzhou 511400, Peoples R China
[5] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[6] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Dept Automat, Sch Elect Informat & Elect Engn,AI Inst, Shanghai 200240, Peoples R China
[7] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer architecture; Training; Neural networks; Search problems; Heuristic algorithms; Computational modeling; Knowledge engineering; Image recognition; knowledge distillation; multiple searched networks; neural architecture search;
D O I
10.1109/TPAMI.2023.3293885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed to be fully trained. It may not only incur huge computation costs but also affect the architecture ranking in the retraining procedure. We propose a multi-teacher-guided NAS, which proposes to use the adaptive ensemble and perturbation-aware knowledge distillation algorithm in the one-shot-based NAS algorithm. The optimization method aiming to find the optimal descent directions is used to obtain adaptive coefficients for the feature maps of the combined teacher model. Besides, we propose a specific knowledge distillation process for optimal architectures and perturbed ones in each searching process to learn better feature maps for later distillation procedures. Comprehensive experiments verify our approach is flexible and effective. We show improvement in precision and search efficiency in the standard recognition dataset. We also show improvement in correlation between the accuracy of the search algorithm and true accuracy by NAS benchmark datasets.
引用
收藏
页码:13489 / 13508
页数:20
相关论文
共 61 条
[41]  
Tan MX, 2019, PROC CVPR IEEE, P2815, DOI [arXiv:1807.11626, 10.1109/CVPR.2019.00293]
[42]  
Tan MX, 2019, PR MACH LEARN RES, V97
[43]  
Tran L., 2020, P ICML WORKSH UNC RO, P1
[44]  
Wang DL, 2021, PR MACH LEARN RES, V139, P7769
[45]   AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling [J].
Wang, Dilin ;
Li, Meng ;
Gong, Chengyue ;
Chandra, Vikas .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6414-6423
[46]   Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks [J].
Wang, Lin ;
Yoon, Kuk-Jin .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) :3048-3068
[47]   FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search [J].
Wu, Bichen ;
Dai, Xiaoliang ;
Zhang, Peizhao ;
Wang, Yanghan ;
Sun, Fei ;
Wu, Yiming ;
Tian, Yuandong ;
Vajda, Peter ;
Jia, Yangqing ;
Keutzer, Kurt .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10726-10734
[48]   Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search [J].
Chu, Xiangxiang ;
Zhou, Tianbao ;
Zhang, Bo ;
Li, Jixiang .
COMPUTER VISION - ECCV 2020, PT XV, 2020, 12360 :465-480
[49]  
Xie S., 2018, P INT C LEARN REPR, P1
[50]   Aggregated Residual Transformations for Deep Neural Networks [J].
Xie, Saining ;
Girshick, Ross ;
Dollar, Piotr ;
Tu, Zhuowen ;
He, Kaiming .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5987-5995