A Spatio-temporal Transformer for 3D Human Motion Prediction

被引:117
作者
Aksan, Emre [1 ]
Kaufmann, Manuel [1 ]
Cao, Peng [2 ,3 ]
Hilliges, Otmar [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] MIT, Cambridge, MA 02139 USA
[3] Peking Univ, Beijing, Peoples R China
来源
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021) | 2021年
关键词
D O I
10.1109/3DV53792.2021.00066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a predetermined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence generation of plausible future developments over both short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial self-attention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons. We make our code publicly available at https://github.com/eth-ait/motion-transformer.
引用
收藏
页码:565 / 574
页数:10
相关论文
共 37 条
  • [1] Structured Prediction Helps 3D Human Motion Modelling
    Aksan, Emre
    Kaufmann, Manuel
    Hilliges, Otmar
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7143 - 7152
  • [2] Al-Rfou R, 2019, AAAI CONF ARTIF INTE, P3159
  • [3] Reorientation Effects in Vitreous Carbon and Pyrolytic Graphite
    Lewis, J. C.
    Floyd, I. J.
    [J]. JOURNAL OF MATERIALS SCIENCE, 1966, 1 (02) : 154 - 159
  • [4] Auli M., 2019, Int. J. Comput. Vis., V128, P1
  • [5] Bütepage J, 2018, IEEE INT CONF ROBOT, P4563, DOI 10.1109/ICRA.2018.8460651
  • [6] Deep representation learning for human motion prediction and classification
    Butepage, Judith
    Black, Michael J.
    Kragic, Danica
    Kjellstrom, Hedvig
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1591 - 1599
  • [7] Child Rewon, 2019, CORR
  • [8] Action-Agnostic Human Pose Forecasting
    Chiu, Hsu-kuang
    Adeli, Ehsan
    Wang, Borui
    Huang, De-An
    Niebles, Juan Carlos
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1423 - 1432
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Du Xiaoxiao, 2019, IEEE ROBOTICS AUTOMA