SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition

被引：13

作者：

Li, Zhiheng ^{[1
]}

Gong, Xuyuan ^{[1
]}

Song, Ran ^{[1
]}

Duan, Peng ^{[2
]}

Liu, Jun ^{[3
]}

Zhang, Wei ^{[1
]}

机构：

[1] Shandong Univ, Sch Control Sci & Engn, Jinan, Peoples R China

[2] Liaocheng Univ, Sch Comp Sci, Liaocheng, Peoples R China

[3] Singapore Univ Technol & Design, Informat Syst Technol & Design Pillar, Singapore, Singapore

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Skeleton-based; action recognition; few-shot learning;

D O I：

10.1109/TIP.2022.3226410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on skeleton-based few-shot action recognition. Since skeleton is essentially a sparse representation of human action, the feature maps extracted from it, through a standard encoder network in the few-shot condition, may not be sufficiently discriminative for some action sequences that look partially similar to each other. To address this issue, we propose a self and mutual adaptive matching (SMAM) module to convert such feature maps into more discriminative feature vectors. Our method, named as SMAM-Net, first leverages both the temporal information associated with each individual skeleton joint and the spatial relationship among them for feature extraction. Then, the SMAM module adaptively measures the similarity between labeled and query samples and further carries out feature matching within the query set to distinguish similar skeletons of various action categories. Experimental results show that the SMAM-Net outperforms other baselines on the large-scale NTU RGB + D 120 dataset in the tasks of one-shot and five-shot action recognition. We also report our results on smaller datasets including NTU RGB + D 60, SYSU and PKU-MMD to demonstrate that our method is reliable and generalises well on different datasets. Codes and the pretrained SMAM-Net will be made publicly available.

引用

页码：392 / 402

页数：11

共 48 条

[1] Bishay M, 2019, Arxiv, DOI [arXiv:1907.09021, DOI 10.5244/C.33.130]
[2] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Chen, Yuxin
Zhang, Ziqi
Yuan, Chunfeng
Li, Bing
Deng, Ying
Hu, Weiming
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
[3] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[4] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[5] Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[6] Fanello SR, 2013, LECT NOTES COMPUT SC, V7887, P31
[7] The "something something" video database for learning and evaluating visual common sense
Goyal, Raghav
Kahou, Samira Ebrahimi
Michalski, Vincent
Materzynska, Joanna
Westphal, Susanne
Kim, Heuna
Haenel, Valentin
Fruend, Ingo
Yianilos, Peter
Mueller-Freitag, Moritz
Hoppe, Florian
Thurau, Christian
Bax, Ingo
Memisevic, Roland
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5843 - 5851
[8] Memory-Augmented Relation Network for Few-Shot Learning
He, Jun
Hong, Richang
Liu, Xueliang
Xu, Mingliang
Zha, Zheng-Jun
Wang, Meng
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1236 - 1244
[9] Hodas N., 2019, arXiv
[10] Hou YT, 2020, Arxiv, DOI arXiv:2006.05702

← 1 2 3 4 5 →