MNGNAS: Distilling Adaptive Combination of Multiple Searched Networks for One-Shot Neural Architecture Search

被引:84
作者
Chen, Zhihua [1 ]
Qiu, Guhao [1 ]
Li, Ping [2 ,3 ]
Zhu, Lei [4 ,5 ]
Yang, Xiaokang [6 ]
Sheng, Bin [7 ]
机构
[1] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Sch Design, Hong Kong, Peoples R China
[4] Hong Kong Univ Sci & Technol Guangzhou, ROAS Thrust, Guangzhou 511400, Peoples R China
[5] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[6] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Dept Automat, Sch Elect Informat & Elect Engn,AI Inst, Shanghai 200240, Peoples R China
[7] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer architecture; Training; Neural networks; Search problems; Heuristic algorithms; Computational modeling; Knowledge engineering; Image recognition; knowledge distillation; multiple searched networks; neural architecture search;
D O I
10.1109/TPAMI.2023.3293885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed to be fully trained. It may not only incur huge computation costs but also affect the architecture ranking in the retraining procedure. We propose a multi-teacher-guided NAS, which proposes to use the adaptive ensemble and perturbation-aware knowledge distillation algorithm in the one-shot-based NAS algorithm. The optimization method aiming to find the optimal descent directions is used to obtain adaptive coefficients for the feature maps of the combined teacher model. Besides, we propose a specific knowledge distillation process for optimal architectures and perturbed ones in each searching process to learn better feature maps for later distillation procedures. Comprehensive experiments verify our approach is flexible and effective. We show improvement in precision and search efficiency in the standard recognition dataset. We also show improvement in correlation between the accuracy of the search algorithm and true accuracy by NAS benchmark datasets.
引用
收藏
页码:13489 / 13508
页数:20
相关论文
共 61 条
[31]   ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [J].
Ma, Ningning ;
Zhang, Xiangyu ;
Zheng, Hai-Tao ;
Sun, Jian .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :122-138
[32]   Relational Knowledge Distillation [J].
Park, Wonpyo ;
Kim, Dongju ;
Lu, Yan ;
Cho, Minsu .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3962-3971
[33]  
Peng H., 2020, Advances in Neural Information Processing Systems, V33
[34]  
Pham H, 2018, PR MACH LEARN RES, V80
[35]  
Real E, 2017, PR MACH LEARN RES, V70
[36]  
Real E, 2019, AAAI CONF ARTIF INTE, P4780
[37]  
Romero Adriana, 2015, Proc. ICLR
[38]   MobileNetV2: Inverted Residuals and Linear Bottlenecks [J].
Sandler, Mark ;
Howard, Andrew ;
Zhu, Menglong ;
Zhmoginov, Andrey ;
Chen, Liang-Chieh .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4510-4520
[39]   MEAL: Multi-Model Ensemble via Adversarial Learning [J].
Shen, Zhiqiang ;
He, Zhankui ;
Xue, Xiangyang .
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, :4886-4893
[40]   Prioritized Architecture Sampling with Monto-Carlo Tree Search [J].
Su, Xiu ;
Huang, Tao ;
Li, Yanxi ;
You, Shan ;
Wang, Fei ;
Qian, Chen ;
Zhang, Changshui ;
Xu, Chang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10963-10972