Semantic Correlation Attention-Based Multiorder Multiscale Feature Fusion Network for Human Motion Prediction

被引:3
作者
Li, Qin [1 ]
Wang, Yong [1 ]
Lv, Fanbing [2 ]
机构
[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
[2] Changsha Hisense Intelligent Syst Res Inst Co Ltd, Res & Dev Dept, Changsha 410208, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Feature extraction; Correlation; Joints; Predictive models; Bones; Decoding; Feature fusion; graph neural network (GNN); human motion prediction; semantic correlation attention;
D O I
10.1109/TCYB.2022.3184977
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human motion prediction is to predict future human states based on the observed human states. However, current research ignores the semantic correlations between body parts (joints and bones) in the observed human states and motion time; thus, the prediction accuracy is limited. To address this issue, we propose a novel semantic correlation attention-based multiorder multiscale feature fusion network (SCAFF), which includes an encoder and a decoder. In the encoder, a multiorder difference calculation module (MODC) is designed to calculate the multiorder difference information of joint and bone attributes in the observed human states. Then, multiple semantic correlation attention-based graph calculation operators (SCA-GCOs) are stacked to extract the multiscale features of the multiorder difference information. Each SCA-GCO captures joint and bone dependencies of the multiorder difference information, refines them with a semantic correlation attention module (SCAM), and captures temporal dynamics of the refined joint and bone dependencies as the output features. Note that SCAM learns a semantic attention mask describing the semantic correlations between body parts and motion time for feature refinement. Afterward, multiple multiorder feature fusion modules (MOFFs) and multiscale feature fusion modules (MSFFs) are designed to fuse the multiscale features of the multiorder difference information extracted by multiple SCA-GCOs, thus obtaining the motion features of the observed human states. Based on the obtained motion features, the decoder recurrently recruits a composite gated recurrent module (CGRM) and multilayer perceptrons (MLPs) to predict future human states. As far as we know, this is the first attempt to consider the semantic correlations between body parts and motion time in human motion prediction. The results on public datasets demonstrate that SCAFF outperforms existing models.
引用
收藏
页码:825 / 838
页数:14
相关论文
共 39 条
  • [1] DISCRETE COSINE TRANSFORM
    AHMED, N
    NATARAJAN, T
    RAO, KR
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (01) : 90 - 93
  • [2] Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty
    Bhattacharyya, Apratim
    Fritz, Mario
    Schiele, Bernt
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4194 - 4202
  • [3] Pedestrian Models for Autonomous Driving Part II: High-Level Models of Human Behavior
    Camara, Fanta
    Bellotto, Nicola
    Cosar, Serhan
    Weber, Florian
    Nathanael, Dimitris
    Althoff, Matthias
    Wu, Jingyuan
    Ruenz, Johannes
    Dietrich, Andre
    Markkula, Gustav
    Schieben, Anna
    Tango, Fabio
    Merat, Natasha
    Fox, Charles
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (09) : 5453 - 5472
  • [4] Chalavadi G, 2021, INT J COAL PREP UTIL, V41, P67, DOI [10.1080/19392699.2018.1448799, 10.1109/LRA.2018.2800084]
  • [5] Chao X., 2020, PROC ASIAN C COMPUT, P1
  • [6] Action-Agnostic Human Pose Forecasting
    Chiu, Hsu-kuang
    Adeli, Ehsan
    Wang, Borui
    Huang, De-An
    Niebles, Juan Carlos
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1423 - 1432
  • [7] Efficient human motion prediction using temporal convolutional generative adversarial network
    Cui, Qiongjie
    Sun, Huaijiang
    Kong, Yue
    Zhang, Xiaoqian
    Li, Yanmeng
    [J]. INFORMATION SCIENCES, 2021, 545 : 427 - 447
  • [8] Learning Dynamic Relationships for 3D Human Motion Prediction
    Cui, Qiongjie
    Sun, Huaijiang
    Yang, Fei
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6518 - 6526
  • [9] Recurrent Network Models for Human Dynamics
    Fragkiadaki, Katerina
    Levine, Sergey
    Felsen, Panna
    Malik, Jitendra
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4346 - 4354
  • [10] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149