Task Adaptive Modeling for Few-shot Action Recognition

被引：0

作者：

Wang, Jiayi ^{[1
]}

Jin, Yi ^{[1
]}

Feng, Songhe ^{[1
]}

Li, Yidong ^{[1
]}

机构：

[1] Beijing JiaoTong Univ, Sch Comp & Informat Technol, Beijing, Peoples R China

来源：

2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP) | 2022年

基金：

中国国家自然科学基金;

关键词：

action recognition; few-shot learning; task adaptive; video classification;

D O I：

10.1109/MMSP55362.2022.9949513

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Collecting action recognition datasets is time-consuming and labor-intensive. To solve this problem, a few-shot action recognition task that uses episode training to learn the model appears. However, due to the randomness of few-shot learning task sampling, there are great differences between each task, and the characteristics of classes are also diverse. Most of the current methods simply use the same processing flow for action recognition, ignoring the correlation between tasks. To solve this problem, we propose a task adaptive network for few-shot action recognition, which utilizes the dependency of support set and query set categories. Our method mainly includes two key points: Firstly, we add an attention module after the feature extraction module, which can use the attention mechanism to focus the obtained feature representation on more important local information. Secondly, we design a task adaptive module, which uses the support set samples to strengthen all samples of the current task. The module strengthens the common features within each class of the support set and expands the query set to highlight the differences between classes. We have conducted a large number of experiments on two commonly used action recognition data sets: HMDB51 and UCF101. The results of experiments show that our method has strong competitiveness and performs well in the field of few-shot action recognition.

引用

页数：6

共 29 条

[1] Bishay M., 2019, BMVC
[2] Cao KD, 2020, PROC CVPR IEEE, P10615, DOI 10.1109/CVPR42600.2020.01063
[3] Chikontwe P., 2022, ARXIV
[4] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[5] ProtoGAN: Towards Few Shot Learning for Action Recognition
Dwivedi, Sai Kumar
Gupta, Vikram
Mitra, Rahul
Ahmed, Shuaib
Jain, Arjun
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1308 - 1316
[6] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[7] Digging Into Self-Supervised Monocular Depth Estimation
Godard, Clement
Mac Aodha, Oisin
Firman, Michael
Brostow, Gabriel
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3827 - 3837
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] Hongjie Zhang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12348), P102, DOI 10.1007/978-3-030-58580-8_7
[10] Squeeze-and-Excitation Networks
Hu, Jie
Shen, Li
Albanie, Samuel
Sun, Gang
Wu, Enhua
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 2011 - 2023

← 1 2 3 →