Learning spatial-temporal features via a pose-flow relational model for action recognition

被引:2
|
作者
Wu, Qianyu [1 ]
Hu, Fangqiang [1 ]
Zhu, Aichun [1 ,2 ]
Wang, Zixuan [1 ]
Bao, Yaping [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 210000, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221000, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Pose-based action recognition has always been an important research field in computer vision. However; most existing pose-based methods are built upon human skeleton data; which cannot be used to exploit the feature of the motion-related object; i.e; a crucial clue of discriminating human actions. To address this issue; we propose a novel pose-flow relational model; which can benefit from both pose dynamics and optical flow. First; we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second; a hierarchical pose-based network is proposed to effectively explore the rich spatial-Temporal features of human skeleton positions. Third; we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally; we evaluate our model on four popular action recognition benchmarks (HMDB-51; JHMDB; sub-JHMDB; and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. © 2020 Author(s);
D O I
10.1063/5.0011161
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Pose-based action recognition has always been an important research field in computer vision. However, most existing pose-based methods are built upon human skeleton data, which cannot be used to exploit the feature of the motion-related object, i.e., a crucial clue of discriminating human actions. To address this issue, we propose a novel pose-flow relational model, which can benefit from both pose dynamics and optical flow. First, we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second, a hierarchical pose-based network is proposed to effectively explore the rich spatial-temporal features of human skeleton positions. Third, we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally, we evaluate our model on four popular action recognition benchmarks (HMDB-51, JHMDB, sub-JHMDB, and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. (c) 2020 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition
    Gao, Xuehao
    Yang, Yang
    Wu, Yang
    Du, Shaoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12130 - 12141
  • [22] Action Recognition by Joint Spatial-Temporal Motion Feature
    Zhang, Weihua
    Zhang, Yi
    Gao, Chaobang
    Zhou, Jiliu
    JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [23] Spatial-Temporal Separable Attention for Video Action Recognition
    Guo, Xi
    Hu, Yikun
    Chen, Fang
    Jin, Yuhui
    Qiao, Jian
    Huang, Jian
    Yang, Qin
    2022 INTERNATIONAL CONFERENCE ON FRONTIERS OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, FAIML, 2022, : 224 - 228
  • [24] Spatial-Temporal Pyramid Graph Reasoning for Action Recognition
    Geng, Tiantian
    Zheng, Feng
    Hou, Xiaorong
    Lu, Ke
    Qi, Guo-Jun
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5484 - 5497
  • [25] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    Computer Engineering and Applications, 2023, 59 (09): : 150 - 158
  • [26] Action recognition with spatial-temporal discriminative filter banks
    Martinez, Brais
    Modolo, Davide
    Xiong, Yuanjun
    Tighe, Joseph
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5481 - 5490
  • [27] Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
    Luo, Chenxu
    Yuille, Alan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5511 - 5520
  • [28] Select and Focus: Action Recognition with Spatial-Temporal Attention
    Chan, Wensong
    Tian, Zhiqiang
    Liu, Shuai
    Ren, Jing
    Lan, Xuguang
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PT III, 2019, 11742 : 461 - 471
  • [29] Spatial-Temporal Interleaved Network for Efficient Action Recognition
    Jiang, Shengqin
    Zhang, Haokui
    Qi, Yuankai
    Liu, Qingshan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
  • [30] Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN
    Zhu, Aichun
    Wu, Qianyu
    Cui, Ran
    Wang, Tian
    Hang, Wenlong
    Hua, Gang
    Snoussi, Hichem
    NEUROCOMPUTING, 2020, 414 : 90 - 100