SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition

被引:13
作者
Li, Zhiheng [1 ]
Gong, Xuyuan [1 ]
Song, Ran [1 ]
Duan, Peng [2 ]
Liu, Jun [3 ]
Zhang, Wei [1 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan, Peoples R China
[2] Liaocheng Univ, Sch Comp Sci, Liaocheng, Peoples R China
[3] Singapore Univ Technol & Design, Informat Syst Technol & Design Pillar, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
Skeleton-based; action recognition; few-shot learning;
D O I
10.1109/TIP.2022.3226410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on skeleton-based few-shot action recognition. Since skeleton is essentially a sparse representation of human action, the feature maps extracted from it, through a standard encoder network in the few-shot condition, may not be sufficiently discriminative for some action sequences that look partially similar to each other. To address this issue, we propose a self and mutual adaptive matching (SMAM) module to convert such feature maps into more discriminative feature vectors. Our method, named as SMAM-Net, first leverages both the temporal information associated with each individual skeleton joint and the spatial relationship among them for feature extraction. Then, the SMAM module adaptively measures the similarity between labeled and query samples and further carries out feature matching within the query set to distinguish similar skeletons of various action categories. Experimental results show that the SMAM-Net outperforms other baselines on the large-scale NTU RGB + D 120 dataset in the tasks of one-shot and five-shot action recognition. We also report our results on smaller datasets including NTU RGB + D 60, SYSU and PKU-MMD to demonstrate that our method is reliable and generalises well on different datasets. Codes and the pretrained SMAM-Net will be made publicly available.
引用
收藏
页码:392 / 402
页数:11
相关论文
共 48 条
  • [1] Bishay M, 2019, Arxiv, DOI [arXiv:1907.09021, DOI 10.5244/C.33.130]
  • [2] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
    Chen, Yuxin
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Bing
    Deng, Ying
    Hu, Weiming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
  • [3] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
  • [4] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [5] Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
  • [6] Fanello SR, 2013, LECT NOTES COMPUT SC, V7887, P31
  • [7] The "something something" video database for learning and evaluating visual common sense
    Goyal, Raghav
    Kahou, Samira Ebrahimi
    Michalski, Vincent
    Materzynska, Joanna
    Westphal, Susanne
    Kim, Heuna
    Haenel, Valentin
    Fruend, Ingo
    Yianilos, Peter
    Mueller-Freitag, Moritz
    Hoppe, Florian
    Thurau, Christian
    Bax, Ingo
    Memisevic, Roland
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5843 - 5851
  • [8] Memory-Augmented Relation Network for Few-Shot Learning
    He, Jun
    Hong, Richang
    Liu, Xueliang
    Xu, Mingliang
    Zha, Zheng-Jun
    Wang, Meng
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1236 - 1244
  • [9] Hodas N., 2019, arXiv
  • [10] Hou YT, 2020, Arxiv, DOI arXiv:2006.05702