(2+1)D Distilled ShuffleNet: A Lightweight Unsupervised Distillation Network for Human Action Recognition

被引:11
作者
Vu, Duc-Quang [1 ,2 ]
Le, Ngan T. H. [3 ]
Wang, Jia-Ching [1 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan, Taiwan
[2] Thai Nguyen Univ Educ, Thai Nguyen, Vietnam
[3] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR USA
来源
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2022年
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICPR56361.2022.9956634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While most existing deep neural networks (DNN) architectures are proposed for increasing performance, they also raise overall model complexity. However, practical applications require lightweight DNN models, that are able to run real-time in edge computing devices. In this work, we present a simple and elegant unsupervised distillation learning paradigm to train a lightweight network to human action recognition called (2+1)D Distilled ShuffleNet. Leveraging the distilling technique, the proposed method allows us to create a lightweight DNN model that achieves high accuracy and real-time speed. Our lightweight (2+1)D Distilled ShuffleNet is designed as an unsupervised paradigm; it does not require labelled data during distilling knowledge from the teacher to the student. Furthermore, to help the student be more "intelligent", we propose to distill the knowledge from two different teachers, i.e., 2D teacher and 3D teacher. The experimental results have shown that our lightweight (2+1)D Distilled ShufileNet outperforms other state-of-the-art distillation networks with 86.4% and 59.9% top-1 accuracy on UCF101 and HMDB51 datasets, respectively, whereas the inference running time is at 47.16 FPS on CPU with only 17.1M parameters and 12.07 GFLOPs.
引用
收藏
页码:3197 / 3203
页数:7
相关论文
共 40 条
[1]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[2]   MARS: Motion-Augmented RGB Stream for Action Recognition [J].
Crasto, Nieves ;
Weinzaepfel, Philippe ;
Alahari, Karteek ;
Schmid, Cordelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883
[3]  
Diba A., 2017, Temporal 3D ConvNets: New architecture and transfer learning for video classification
[4]   Spatio-temporal Channel Correlation Networks for Action Classification [J].
Diba, Ali ;
Fayyaz, Mohsen ;
Sharma, Vivek ;
Arzani, M. Mahdi ;
Yousefzadeh, Rahman ;
Gall, Juergen ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :299-315
[5]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[6]   Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition [J].
Duc-Quang Vu ;
Le, Ngan ;
Wang, Jia-Ching .
IEEE ACCESS, 2021, 9 :105711-105723
[7]  
Fan Haoqi, 2021, ARXIV210411227
[8]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[9]   DistInit: Learning Video Representations Without a Single Labeled Video [J].
Girdhar, Rohit ;
Du Tran ;
Torresani, Lorenzo ;
Ramanan, Deva .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :852-861
[10]   Robust Student Network Learning [J].
Guo, Tianyu ;
Xu, Chang ;
He, Shiyi ;
Shi, Boxin ;
Xu, Chao ;
Tao, Dacheng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) :2455-2468