Few-Shot Learning of Video Action Recognition Only Based on Video Contents

被引:0
作者
Bo, Yang [1 ]
Lu, Yangdi [1 ]
He, Wenbo [1 ]
机构
[1] McMaster Univ, Hamilton, ON, Canada
来源
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2020年
关键词
D O I
10.1109/wacv45572.2020.9093481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of video action recognition based on Deep Neural Networks (DNNs) is highly dependent on a large number of manually labeled videos. In this paper, we introduce a supervised learning approach to recognize video actions with very few training videos. Specifically, we propose Temporal Attention Vectors (TAVs) which adapt various length videos to preserve the temporal information of the entire video. We evaluate the TAVs on UCF101 and HMDB51. Without training any deep 3D or 2D frame feature extractors on video datasets (only pre-trained on ImageNet), the TAVs only introduce 2.1M parameters but outperforms the state-of-the-art video action recognition benchmarks with very few labeled training videos (e.g. 92% on UCF101 and 59% on HMDB51, with 10 and 8 training videos per class, respectively). Furthermore, our approach can still achieve competitive results on full datasets (97.1% on UCF101 and 77% on HMDB51).
引用
收藏
页码:584 / 593
页数:10
相关论文
共 44 条
[1]  
[Anonymous], 2015, ACS SYM SER
[2]  
[Anonymous], 2008, P BMVC 2008 19 BRIT
[3]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[4]  
[Anonymous], 2015, Comput. Sci
[5]  
[Anonymous], 2012, CoRR
[6]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[7]   Human detection using oriented histograms of flow and appearance [J].
Dalal, Navneet ;
Triggs, Bill ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441
[8]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[9]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[10]  
Feichtenhofer C, 2016, ADV NEUR IN, V29