Few-Shot Action Recognition via Multi-View Representation Learning

被引:5
作者
Wang, Xiao [1 ]
Lu, Yang [1 ]
Yu, Wanchuan [1 ]
Pang, Yanwei [2 ,3 ]
Wang, Hanzi [1 ,3 ]
机构
[1] Xiamen Univ, Fujian Key Lab Sensing & Comp Smart City, Sch Informat, Xiamen 361005, Peoples R China
[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin Key Lab Brain Inspired Intelligence Techn, Tianjin 300072, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot learning; action recognition; meta-learning; multi-view representation learning;
D O I
10.1109/TCSVT.2024.3384875
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Few-shot action recognition aims to recognize novel action classes with limited labeled samples and has recently received increasing attention. The core objective of few-shot action recognition is to enhance the discriminability of feature representations. In this paper, we propose a novel multi-view representation learning network (MRLN) to model intra-video and inter-video relations for few-shot action recognition. Specifically, we first propose a spatial-aware aggregation refinement module (SARM), which mainly consists of a spatial-aware aggregation sub-module and a spatial-aware refinement sub-module to explore the spatial context of samples at the frame level. Then, we design a temporal-channel enhancement module (TCEM), which can capture the temporal-aware and channel-aware features of samples with the elaborately designed temporal-aware enhancement sub-module and channel-aware enhancement sub-module. Third, we introduce a cross-video relation module (CVRM), which can explore the relations across videos by utilizing the self-attention mechanism. Moreover, we design a prototype-centered mean absolute error loss to improve the feature learning capability of the proposed MRLN. Extensive experiments on four prevalent few-shot action recognition benchmarks show that the proposed MRLN can significantly outperform a variety of state-of-the-art few-shot action recognition methods. Especially, on the 5-way 1-shot setting, our MRLN respectively achieves 75.7%, 86.9%, 65.5% and 45.9% on the Kinetics, UCF101, HMDB51 and SSv2 datasets.
引用
收藏
页码:8522 / 8535
页数:14
相关论文
共 79 条
[1]  
Bishay M, 2019, Arxiv, DOI arXiv:1907.09021
[2]   Few-Shot Video Classification via Temporal Alignment [J].
Cao, Kaidi ;
Ji, Jingwei ;
Cao, Zhangjie ;
Chang, Chien-Yi ;
Niebles, Juan Carlos .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10615-10624
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]   Attribute-guided Dynamic Routing Graph Network for Transductive Few-shot Learning [J].
Chen, Chaofan ;
Yang, Xiaoshan ;
Yan, Ming ;
Xu, Changsheng .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :6259-6268
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[7]   Improving Adversarially Robust Few-shot Image Classification with Generalizable Representations [J].
Dong, Junhao ;
Wang, Yuan ;
Lai, Jianhuang ;
Xie, Xiaohua .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9015-9024
[8]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[9]   ProtoGAN: Towards Few Shot Learning for Action Recognition [J].
Dwivedi, Sai Kumar ;
Gupta, Vikram ;
Mitra, Rahul ;
Ahmed, Shuaib ;
Jain, Arjun .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1308-1316
[10]   DTR: An Information Bottleneck Based Regularization Framework for Video Action Recognition [J].
Fan, Jiawei ;
Zhao, Yu ;
Yu, Xie ;
Ma, Lihua ;
Liu, Junqi ;
Yi, Fangqiu ;
Li, Boxun .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :3877-3885