Convolutional Self-attention Guided Graph Neural Network for Few-Shot Action Recognition

被引:0
作者
Pan, Fei [1 ]
Guo, Jie [1 ]
Guo, Yanwen [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II | 2023年 / 14087卷
基金
中国国家自然科学基金;
关键词
Action recognition; Few-shot learning; Neural Network;
D O I
10.1007/978-981-99-4742-3_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of few-shot action recognition is to recognize unseen action classes with only a few labeled videos. In this paper, we propose Convolutional Self-Attention Guided Graph Neural Network (CSA-GNN) for few-shot action recognition. First, for each video, we extract features of video frames sampled from the video and obtain a sequence of feature vectors. Then, a convolutional self-attention function is applied to the sequences to capture long-term temporal dependencies. Finally, a graph neural network is utilized to predict the distance between two sequences of feature vectors explicitly, which approximates the distance between the corresponding videos. By this means, we effectively learn the distance between the support and query videos without estimating their temporal alignment. The proposed method is evaluated on four action recognition datasets and achieves state-of-the-art results in the experiments.
引用
收藏
页码:401 / 412
页数:12
相关论文
共 24 条
[1]  
[Anonymous], 2012, CRCVTR1201, DOI DOI 10.48550/ARXIV.1212.0402
[2]   Few-Shot Video Classification via Temporal Alignment [J].
Cao, Kaidi ;
Ji, Jingwei ;
Cao, Zhangjie ;
Chang, Chien-Yi ;
Niebles, Juan Carlos .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10615-10624
[3]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[4]   Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [J].
Fu, Yuqian ;
Zhang, Li ;
Wang, Junke ;
Fu, Yanwei ;
Jiang, Yu-Gang .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1142-1151
[5]   The "something something" video database for learning and evaluating visual common sense [J].
Goyal, Raghav ;
Kahou, Samira Ebrahimi ;
Michalski, Vincent ;
Materzynska, Joanna ;
Westphal, Susanne ;
Kim, Heuna ;
Haenel, Valentin ;
Fruend, Ingo ;
Yianilos, Peter ;
Mueller-Freitag, Moritz ;
Hoppe, Florian ;
Thurau, Christian ;
Bax, Ingo ;
Memisevic, Roland .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851
[6]  
Kay W., 2017, The Kinetics Human Action Video Dataset
[7]  
Kipf T. N., 2016, INT C LEARN REPR, DOI DOI 10.48550/ARXIV.1609.02907
[8]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[9]  
Li SY, 2022, AAAI CONF ARTIF INTE, P1404
[10]   Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments [J].
Nguyen, Khoi D. ;
Quoc-Huy Tran ;
Khoi Nguyen ;
Binh-Son Hua ;
Rang Nguyen .
COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 :471-487