Learning spatial-temporal features via a pose-flow relational model for action recognition

被引:2
|
作者
Wu, Qianyu [1 ]
Hu, Fangqiang [1 ]
Zhu, Aichun [1 ,2 ]
Wang, Zixuan [1 ]
Bao, Yaping [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 210000, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221000, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Pose-based action recognition has always been an important research field in computer vision. However; most existing pose-based methods are built upon human skeleton data; which cannot be used to exploit the feature of the motion-related object; i.e; a crucial clue of discriminating human actions. To address this issue; we propose a novel pose-flow relational model; which can benefit from both pose dynamics and optical flow. First; we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second; a hierarchical pose-based network is proposed to effectively explore the rich spatial-Temporal features of human skeleton positions. Third; we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally; we evaluate our model on four popular action recognition benchmarks (HMDB-51; JHMDB; sub-JHMDB; and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. © 2020 Author(s);
D O I
10.1063/5.0011161
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Pose-based action recognition has always been an important research field in computer vision. However, most existing pose-based methods are built upon human skeleton data, which cannot be used to exploit the feature of the motion-related object, i.e., a crucial clue of discriminating human actions. To address this issue, we propose a novel pose-flow relational model, which can benefit from both pose dynamics and optical flow. First, we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second, a hierarchical pose-based network is proposed to effectively explore the rich spatial-temporal features of human skeleton positions. Third, we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally, we evaluate our model on four popular action recognition benchmarks (HMDB-51, JHMDB, sub-JHMDB, and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. (c) 2020 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:10
相关论文
共 50 条
  • [41] An End-to-End Spatial-Temporal Transformer Model for Surgical Action Triplet Recognition
    Zou, Xiaoyang
    Yu, Derong
    Tao, Rong
    Zheng, Guoyan
    12TH ASIAN-PACIFIC CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING, VOL 2, APCMBE 2023, 2024, 104 : 114 - 120
  • [42] Hierarchy Spatial-Temporal Transformer for Action Recognition in Short Videos
    Cai, Guoyong
    Cai, Yumeng
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 760 - 774
  • [43] Temporal Hockey Action Recognition via Pose and Optical Flows
    Cai, Zixi
    Neher, Helmut
    Vats, Kanav
    Clausi, David A.
    Zelek, John
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 2543 - 2552
  • [44] Cross-dataset activity recognition via adaptive spatial-temporal transfer learning
    Qin X.
    Chen Y.
    Wang J.
    Yu C.
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2019, 3 (04)
  • [45] Action Recognition Based on Spatial-Temporal Pyramid Sparse Coding
    Zhang, Xiaojing
    Zhang, Hua
    Cao, Xiaochun
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1455 - 1458
  • [46] Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition
    Cao, Wenming
    Zhang, Aoyu
    He, Zhihai
    Zhang, Yicha
    Yin, Xinpeng
    IEEE Transactions on Artificial Intelligence, 2024, 5 (11): : 5801 - 5814
  • [47] Multimodal Fusion of Spatial-Temporal Features for Emotion Recognition in the Wild
    Wang, Zuchen
    Fang, Yuchun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 205 - 214
  • [48] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560
  • [49] StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
    He, Dongliang
    Zhou, Zhichao
    Gan, Chuang
    Li, Fu
    Liu, Xiao
    Li, Yandong
    Wang, Limin
    Wen, Shilei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8401 - 8408
  • [50] Robust UWB Indoor Localization for NLOS Scenes via Learning Spatial-Temporal Features
    Yang, Bo
    Li, Jun
    Shao, Zhanpeng
    Zhang, Hong
    IEEE SENSORS JOURNAL, 2022, 22 (08) : 7990 - 8000