A tucker decomposition based knowledge distillation for intelligent edge applications

被引:19
作者
Dai, Cheng [1 ,2 ]
Liu, Xingang [1 ]
Li, Zhuolin [1 ]
Chen, Mu-Yen [3 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Informat & Commun Engn, Chengdu, Peoples R China
[2] McMaster Univ, Dept Elect Engn & Comp Sci, Hamilton, ON L8S 4K1, Canada
[3] Natl Cheng Kung Univ, Dept Engn Sci, Tainan, Taiwan
基金
中国国家自然科学基金;
关键词
Knowledge distillation; Intelligent edge computing; Deep learning; Tensor decomposition; DEEP COMPUTATION MODEL;
D O I
10.1016/j.asoc.2020.107051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation(KD) has been proven an effective method in intelligent edge computing and have achieved extensive study in recent deep learning research. However, when the teacher network is too stronger compared to the student network, the effect of knowledge distillation is not ideal. Aiming at resolving this problem, an improved method of knowledge distillation (TDKD) is proposed, which enables to transfer the complex mapping functions learned by cumbersome models to relatively simpler models. Firstly, the tucker-2 decomposition was performed on the convolutional layers of the original teacher model to reduce the capacity variance between the teacher network and student network. Then, the decomposed model will be used as a new teacher to participate in knowledge distillation for the student model. The experimental results show that the TDKD method can effectively solve the problem of poor distillation performance, which not only get better results if the KD method is effective, but also can reactivate the invalid KD method to some extents. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 25 条
[1]  
Ahn D, 2016, INT SOC DESIGN CONF, P1, DOI 10.1109/ISOCC.2016.7799763
[2]  
Ba LJ, 2014, ADV NEUR IN, V27
[3]   Model Compression and Acceleration for Deep Neural Networks The principles, progress, and challenges [J].
Cheng, Yu ;
Wang, Duo ;
Zhou, Pan ;
Zhang, Tao .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :126-136
[4]   On the Efficacy of Knowledge Distillation [J].
Cho, Jang Hyun ;
Hariharan, Bharath .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4793-4801
[5]   A comprehensive survey on model compression and acceleration [J].
Choudhary, Tejalal ;
Mishra, Vipul ;
Goswami, Anurag ;
Sarangapani, Jagannathan .
ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (07) :5113-5155
[6]   Video Scene Segmentation Using Tensor-Train Faster-RCNN for Multimedia IoT Systems [J].
Dai, Cheng ;
Liu, Xingang ;
Yang, Laurence T. ;
Ni, Minghao ;
Ma, Zhenchao ;
Zhang, Qingchen ;
Deen, M. Jamal .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (12) :9697-9705
[7]   A Low-Latency Object Detection Algorithm for the Edge Devices of IoV Systems [J].
Dai, Cheng ;
Liu, Xingang ;
Chen, Weiting ;
Lai, Chin-Feng .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (10) :11169-11178
[8]   Human Behavior Deep Recognition Architecture for Smart City Applications in the 5G Environment [J].
Dai, Cheng ;
Liu, Xingang ;
Lai, Jinfeng ;
Li, Pan ;
Chao, Han-Chieh .
IEEE NETWORK, 2019, 33 (05) :206-211
[9]   Model Compression for IoT Applications in Industry 4.0 via Multiscale Knowledge Transfer [J].
Fu, Shipeng ;
Li, Zhen ;
Liu, Kai ;
Din, Sadia ;
Imran, Muhammad ;
Yang, Xiaomin .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (09) :6013-6022
[10]  
Hinton G., 2015, NIPS DEEP LEARN REPR, P38, DOI DOI 10.48550/ARXIV.1503.02531