Few-Shot Action Recognition via Multi-View Representation Learning

被引：5

作者：

Wang, Xiao ^{[1
]}

Lu, Yang ^{[1
]}

Yu, Wanchuan ^{[1
]}

Pang, Yanwei ^{[2
,3
]}

Wang, Hanzi ^{[1
,3
]}

机构：

[1] Xiamen Univ, Fujian Key Lab Sensing & Comp Smart City, Sch Informat, Xiamen 361005, Peoples R China

[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin Key Lab Brain Inspired Intelligence Techn, Tianjin 300072, Peoples R China

[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Few-shot learning; action recognition; meta-learning; multi-view representation learning;

D O I：

10.1109/TCSVT.2024.3384875

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Few-shot action recognition aims to recognize novel action classes with limited labeled samples and has recently received increasing attention. The core objective of few-shot action recognition is to enhance the discriminability of feature representations. In this paper, we propose a novel multi-view representation learning network (MRLN) to model intra-video and inter-video relations for few-shot action recognition. Specifically, we first propose a spatial-aware aggregation refinement module (SARM), which mainly consists of a spatial-aware aggregation sub-module and a spatial-aware refinement sub-module to explore the spatial context of samples at the frame level. Then, we design a temporal-channel enhancement module (TCEM), which can capture the temporal-aware and channel-aware features of samples with the elaborately designed temporal-aware enhancement sub-module and channel-aware enhancement sub-module. Third, we introduce a cross-video relation module (CVRM), which can explore the relations across videos by utilizing the self-attention mechanism. Moreover, we design a prototype-centered mean absolute error loss to improve the feature learning capability of the proposed MRLN. Extensive experiments on four prevalent few-shot action recognition benchmarks show that the proposed MRLN can significantly outperform a variety of state-of-the-art few-shot action recognition methods. Especially, on the 5-way 1-shot setting, our MRLN respectively achieves 75.7%, 86.9%, 65.5% and 45.9% on the Kinetics, UCF101, HMDB51 and SSv2 datasets.

引用

页码：8522 / 8535

页数：14

共 79 条

[1]

Bishay M, 2019, Arxiv, DOI arXiv:1907.09021

[2] Few-Shot Video Classification via Temporal Alignment [J].

Cao, Kaidi ;

Ji, Jingwei ;

Cao, Zhangjie ;

Chang, Chien-Yi ;

Niebles, Juan Carlos .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10615-10624

[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[4] Attribute-guided Dynamic Routing Graph Network for Transductive Few-shot Learning [J].

Chen, Chaofan ;

Yang, Xiaoshan ;

Yan, Ming ;

Xu, Changsheng .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :6259-6268

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6]

Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878

[7] Improving Adversarially Robust Few-shot Image Classification with Generalizable Representations [J].

Dong, Junhao ;

Wang, Yuan ;

Lai, Jianhuang ;

Xie, Xiaohua .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9015-9024

[8] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[9] ProtoGAN: Towards Few Shot Learning for Action Recognition [J].

Dwivedi, Sai Kumar ;

Gupta, Vikram ;

Mitra, Rahul ;

Ahmed, Shuaib ;

Jain, Arjun .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1308-1316

[10] DTR: An Information Bottleneck Based Regularization Framework for Video Action Recognition [J].

Fan, Jiawei ;

Zhao, Yu ;

Yu, Xie ;

Ma, Lihua ;

Liu, Junqi ;

Yi, Fangqiu ;

Li, Boxun .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :3877-3885

← 1 2 3 4 5 6 7 8 →