SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

被引:4
作者
Xiao, Xiongjiang [1 ]
Ren, Ziliang [1 ]
Li, Huan [1 ]
Wei, Wenhong [1 ]
Yang, Zhiyong [2 ]
Yang, Huaide [3 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523820, Peoples R China
[2] Yantai Inst Technol, Sch Artificial Intelligence, Yantai 264003, Peoples R China
[3] Dongguan Polytech, Sch Elect Informat, Dongguan 523109, Peoples R China
基金
中国国家自然科学基金;
关键词
action recognition; multimodality compensation; SlowFast pathways; swin transformer; dual-stream; NEURAL-NETWORKS; REPRESENTATION;
D O I
10.3390/math11092115
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs.
引用
收藏
页数:19
相关论文
共 53 条
  • [1] ViViT: A Video Vision Transformer
    Arnab, Anurag
    Dehghani, Mostafa
    Heigold, Georg
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
  • [2] Action Recognition with Dynamic Image Networks
    Bilen, Hakan
    Fernando, Basura
    Gavves, Efstratios
    Vedaldi, Andrea
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) : 2799 - 2813
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Gao, Xiangyang
    Hao, Fusheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1498 - 1509
  • [5] A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector
    Das Dawn, Debapratim
    Shaikh, Soharab Hossain
    [J]. VISUAL COMPUTER, 2016, 32 (03) : 289 - 306
  • [6] Das Srijan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P72, DOI 10.1007/978-3-030-58545-7_5
  • [7] Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
    Donahue, Jeff
    Hendricks, Lisa Anne
    Rohrbach, Marcus
    Venugopalan, Subhashini
    Guadarrama, Sergio
    Saenko, Kate
    Darrell, Trevor
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) : 677 - 691
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [10] Understanding the Gap between 2D and 3D Skeleton-Based Action Recognition
    Elias, Petr
    Sedmidubsky, Jan
    Zezula, Pavel
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 192 - 195