GAN-Poser: an improvised bidirectional GAN model for human motion prediction

被引:23
作者
Jain, Deepak Kumar [1 ]
Zareapoor, Masoumeh [2 ]
Jain, Rachna [3 ]
Kathuria, Abhishek [3 ]
Bachhety, Shivam [3 ]
机构
[1] Chongqing Univ Posts & Telecommun, Key Lab Intelligent Air Ground Cooperat Control U, Coll Automat, Chongqing, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
[3] Bharati Vidyapeeths Coll Engn, Dept Comp Sci & Engn, New Delhi, India
关键词
Human motion; GAN; Probability theory; Pose estimation; Sequence model; 3D model; NEURAL-NETWORKS; 3D;
D O I
10.1007/s00521-020-04941-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel method called GAN-Poser has been explored to predict human motion in less time given an input 3D human skeleton sequence based on a generator-discriminator framework. Specifically, rather than using the conventional Euclidean loss, a frame-wise geodesic loss is used for geometrically meaningful and more precise distance measurement. In this paper, we have used a bidirectional GAN framework along with a recursive prediction strategy to avoid mode-collapse and to further regularize the training. To be able to generate multiple probable human-pose sequences conditioned on a given starting sequence, a random extrinsic factor Theta has also been introduced. The discriminator is trained in order to regress the extrinsic factor Theta, which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. In spite of being in a probabilistic framework, the modified discriminator architecture allows predictions of an intermediate part of pose sequence to be used as conditioning for prediction of the latter part of the sequence. This adversarial learning-based model takes into consideration of the stochasticity, and the bidirectional setup provides a new direction to evaluate the prediction quality against a given test sequence. Our resulting novel method, GAN-Poser, achieves superior performance over the state-of-the-art deep learning approaches when evaluated on the standard NTU-RGB-D and Human3.6 M dataset.
引用
收藏
页码:14579 / 14591
页数:13
相关论文
共 48 条
  • [1] Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751
  • [2] [Anonymous], ARXIV170604124
  • [3] Arjovsky Martin, 2017, ARXIV170107875
  • [4] BACCOUCHE M, 2011, HUMAN BEHAV UNDERSTA, P16
  • [5] HP-GAN: Probabilistic 3D human motion prediction via GAN
    Barsoum, Emad
    Kender, John
    Liu, Zicheng
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1499 - 1508
  • [6] Learning Deep Architectures for AI
    Bengio, Yoshua
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01): : 1 - 127
  • [7] Berglund Mathias, 2015, ADV NEURAL INFORM PR, P856
  • [8] Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks
    Bitzer, Sebastian
    Kiebel, Stefan J.
    [J]. BIOLOGICAL CYBERNETICS, 2012, 106 (4-5) : 201 - 217
  • [9] Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
    Bogo, Federica
    Kanazawa, Angjoo
    Lassner, Christoph
    Gehler, Peter
    Romero, Javier
    Black, Michael J.
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 561 - 578
  • [10] Kullback-Leibler Divergence Between Multivariate Generalized Gaussian Distributions
    Bouhlel, Nizar
    Dziri, Ali
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (07) : 1021 - 1025