Learning spatial-temporal features via a pose-flow relational model for action recognition

被引:2
|
作者
Wu, Qianyu [1 ]
Hu, Fangqiang [1 ]
Zhu, Aichun [1 ,2 ]
Wang, Zixuan [1 ]
Bao, Yaping [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 210000, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221000, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Pose-based action recognition has always been an important research field in computer vision. However; most existing pose-based methods are built upon human skeleton data; which cannot be used to exploit the feature of the motion-related object; i.e; a crucial clue of discriminating human actions. To address this issue; we propose a novel pose-flow relational model; which can benefit from both pose dynamics and optical flow. First; we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second; a hierarchical pose-based network is proposed to effectively explore the rich spatial-Temporal features of human skeleton positions. Third; we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally; we evaluate our model on four popular action recognition benchmarks (HMDB-51; JHMDB; sub-JHMDB; and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. © 2020 Author(s);
D O I
10.1063/5.0011161
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Pose-based action recognition has always been an important research field in computer vision. However, most existing pose-based methods are built upon human skeleton data, which cannot be used to exploit the feature of the motion-related object, i.e., a crucial clue of discriminating human actions. To address this issue, we propose a novel pose-flow relational model, which can benefit from both pose dynamics and optical flow. First, we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second, a hierarchical pose-based network is proposed to effectively explore the rich spatial-temporal features of human skeleton positions. Third, we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally, we evaluate our model on four popular action recognition benchmarks (HMDB-51, JHMDB, sub-JHMDB, and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. (c) 2020 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:10
相关论文
共 50 条
  • [1] SAST: Learning Semantic Action-Aware Spatial-Temporal Features for Efficient Action Recognition
    Wang, Fei
    Wang, Guorui
    Huang, Yunwen
    Chu, Hao
    IEEE ACCESS, 2019, 7 : 164876 - 164886
  • [2] Human action recognition based on spatial-temporal relational model and LSTM-CNN framework
    Senthilkumar, N.
    Manimegalai, M.
    Karpakam, S.
    Ashokkumar, S. R.
    Premkumar, M.
    MATERIALS TODAY-PROCEEDINGS, 2022, 57 : 2087 - 2091
  • [3] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [4] Human action recognition via multi-task learning base on spatial-temporal feature
    Guo, Wenzhong
    Chen, Guolong
    INFORMATION SCIENCES, 2015, 320 : 418 - 428
  • [5] Video-based Driver Action Recognition via Spatial-Temporal and Motion Deep Learning
    Ma, Fangzhi
    Xing, Guanyu
    Liu, Yanli
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Video-based Driver Action Recognition via Spatial-Temporal and Motion Deep Learning
    Ma, Fangzhi
    Xing, Guanyu
    Liu, Yanli
    Proceedings of the International Joint Conference on Neural Networks, 2023, 2023-June
  • [7] Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
    Li, Jinghua
    Huai, Huarui
    Gao, Junbin
    Kong, Dehui
    Wang, Lichun
    JOURNAL ON MULTIMODAL USER INTERFACES, 2019, 13 (04) : 363 - 371
  • [8] Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
    Jinghua Li
    Huarui Huai
    Junbin Gao
    Dehui Kong
    Lichun Wang
    Journal on Multimodal User Interfaces, 2019, 13 : 363 - 371
  • [9] Fusion of spatial-temporal and kinematic features for gait recognition with deterministic learning
    Deng, Muqing
    Wang, Cong
    Cheng, Fengjiang
    Zeng, Wei
    PATTERN RECOGNITION, 2017, 67 : 186 - 200
  • [10] Evaluation of local spatial-temporal features for cross-view action recognition
    Gao, Zan
    Nie, Weizhi
    Liu, Anan
    Zhang, Hua
    NEUROCOMPUTING, 2016, 173 : 110 - 117