A Multi-scale Interaction Motion Network for Action Recognition Based on Capsule Network

被引:0
作者
Zheng, Xiangping [1 ]
Liang, Xun [1 ]
Wu, Bo [1 ]
Wang, Jun [2 ]
Guo, Yuhui [1 ]
Zhang, Xuan [1 ]
Mai, Yuefeng [3 ]
机构
[1] Renmin Univ China, Infomat Sch, Beijing, Peoples R China
[2] Swinburne Univ Technol, Melbourne, Vic, Australia
[3] Qufu Normal Univ, Shandong, Peoples R China
来源
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, action recognition has achieved impressive performance, mainly due to the aid of deep convolutional neural networks and large datasets. Traditionally, most efforts in action recognition have focused on capturing motion information by dense optical flow, but optical flow extraction is very time-consuming. Moreover, prior arts seek to improve accuracy but neglect the part-whole relationship between objects in videos, which may be self-defeating and even deteriorate the performance of methods. To circumvent the above challenges, we present a novel collaborative multipath capsule network (CMCN) for action recognition. In particular, we propose a plug-and-play collaborative multipath block containing spatiotemporal, channel, and motion units, which are complementary and crucial information for action recognition. We exploit the interaction of these three units and selectively emphasize informative spatial-temporal motion to reduce the expensive computational costs. Subsequently, we explore a new capsule voting procedure to reduce the computation used in the capsule dynamic routing mechanism. The critical insight is that the same type of capsules simulates the same entity in different positions, and their voting results should be consistent. This strategy lessens the number of learning parameters that backward pass in the training process, and thus strengthens part-whole relationships in a video. Extensive experiments on multiple real-world datasets for action recognition demonstrate that our model significantly outperforms state-of-the-art models.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 35 条
  • [1] Afshar P, 2018, IEEE IMAGE PROC, P3129, DOI 10.1109/ICIP.2018.8451379
  • [2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [3] Spatio-temporal Channel Correlation Networks for Action Classification
    Diba, Ali
    Fayyaz, Mohsen
    Sharma, Vivek
    Arzani, M. Mahdi
    Yousefzadeh, Rahman
    Gall, Juergen
    Van Gool, Luc
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 299 - 315
  • [4] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [5] Duarte K, 2018, ADV NEUR IN, V31
  • [6] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [7] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [8] Gagana B, 2018, 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P1172, DOI 10.1109/ICACCI.2018.8554604
  • [9] Gowda SN, 2021, AAAI CONF ARTIF INTE, V35, P1451
  • [10] The "something something" video database for learning and evaluating visual common sense
    Goyal, Raghav
    Kahou, Samira Ebrahimi
    Michalski, Vincent
    Materzynska, Joanna
    Westphal, Susanne
    Kim, Heuna
    Haenel, Valentin
    Fruend, Ingo
    Yianilos, Peter
    Mueller-Freitag, Moritz
    Hoppe, Florian
    Thurau, Christian
    Bax, Ingo
    Memisevic, Roland
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5843 - 5851