Learning spatial-temporal features via a pose-flow relational model for action recognition

被引:2
|
作者
Wu, Qianyu [1 ]
Hu, Fangqiang [1 ]
Zhu, Aichun [1 ,2 ]
Wang, Zixuan [1 ]
Bao, Yaping [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 210000, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221000, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Pose-based action recognition has always been an important research field in computer vision. However; most existing pose-based methods are built upon human skeleton data; which cannot be used to exploit the feature of the motion-related object; i.e; a crucial clue of discriminating human actions. To address this issue; we propose a novel pose-flow relational model; which can benefit from both pose dynamics and optical flow. First; we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second; a hierarchical pose-based network is proposed to effectively explore the rich spatial-Temporal features of human skeleton positions. Third; we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally; we evaluate our model on four popular action recognition benchmarks (HMDB-51; JHMDB; sub-JHMDB; and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. © 2020 Author(s);
D O I
10.1063/5.0011161
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Pose-based action recognition has always been an important research field in computer vision. However, most existing pose-based methods are built upon human skeleton data, which cannot be used to exploit the feature of the motion-related object, i.e., a crucial clue of discriminating human actions. To address this issue, we propose a novel pose-flow relational model, which can benefit from both pose dynamics and optical flow. First, we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second, a hierarchical pose-based network is proposed to effectively explore the rich spatial-temporal features of human skeleton positions. Third, we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally, we evaluate our model on four popular action recognition benchmarks (HMDB-51, JHMDB, sub-JHMDB, and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition. (c) 2020 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Advanced skeleton-based action recognition via spatial-temporal rotation descriptors
    Shen, Zhongwei
    Wu, Xiao-Jun
    Kittler, Josef
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1335 - 1346
  • [32] Learning to effectively model spatial-temporal heterogeneity for traffic flow forecasting
    Minrui Xu
    Xiyang Li
    Fucheng Wang
    Jedi S. Shang
    Tai Chong
    Wanjun Cheng
    Jiajie Xu
    World Wide Web, 2023, 26 : 849 - 865
  • [33] Learning Effective Spatial-Temporal Features for sEMG Armband-Based Gesture Recognition
    Zhang, Yingwei
    Chen, Yiqiang
    Yu, Hanchao
    Yang, Xiaodong
    Lu, Wang
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (08): : 6979 - 6992
  • [34] Learning to effectively model spatial-temporal heterogeneity for traffic flow forecasting
    Xu, Minrui
    Li, Xiyang
    Wang, Fucheng
    Shang, Jedi S.
    Chong, Tai
    Cheng, Wanjun
    Xu, Jiajie
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (03): : 849 - 865
  • [35] A Separable Spatial-Temporal Graph Learning Approach for Skeleton-Based Action Recognition
    Zheng, Hui
    Zhao, Ye-Sheng
    Zhang, Bo
    Shang, Guo-Qiang
    IEEE SENSORS LETTERS, 2024, 8 (11)
  • [36] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
    Li, Chenhao
    Zhang, Jing
    Yao, Jiacheng
    NEUROCOMPUTING, 2021, 453 : 383 - 392
  • [37] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [38] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [39] Joints-Centered Spatial-Temporal Features Fused Skeleton Convolution Network for Action Recognition
    Song, Wenfeng
    Chu, Tangli
    Li, Shuai
    Li, Nannan
    Hao, Aimin
    Qin, Hong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4602 - 4616
  • [40] Human Action Recognition by Decision-Making Level Fusion Based on Spatial-Temporal Features
    Li Yandi
    Xu Xiping
    ACTA OPTICA SINICA, 2018, 38 (08)