Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning

被引:19
作者
Su, Yukun [1 ]
Lin, Guosheng [2 ]
Sun, Ruizhou [1 ]
Hao, Yun [1 ]
Wu, Qingyao [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Nanyang Technol Univ, Singapore, Singapore
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
self-supervised; 3D skeleton action; uncertainty; probabilistic embedding; space;
D O I
10.1145/3474085.3475248
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning (SSL) has been proved very effective in learning representations from unlabeled data in language and vision domains. Yet, very few instrumental self-supervised approaches exist for 3D skeleton action understanding, and directly applying the existing SSL methods from other domains for skeleton action learning may suffer from misalignment of representations and some limitations. In this paper, we consider that a good representation learning encoder can distinguish the underlying features of different actions, which can make the similar motions closer while pushing the dissimilar motions away. There exists, however, some uncertainties in the skeleton actions due to the inherent ambiguity of 3D skeleton pose in different viewpoints or the sampling algorithm in contrastive learning, thus, it is ill-posed to differentiate the action features in the deterministic embedding space. To address these issues, we rethink the distance between action features and propose to model each action representation into the probabilistic embedding space to alleviate the uncertainties upon encountering the ambiguous 3D skeleton inputs. To validate the effectiveness of the proposed method, extensive experiments are conducted on Kinetics, NTU60, NTU120, and PKUMMD datasets with several alternative network architectures. Experimental evaluations demonstrate the superiority of our approach and through which, we can gain significant performance improvement without using extra labeled data.
引用
收藏
页码:769 / 778
页数:10
相关论文
共 56 条
  • [11] Hassani K, 2020, Arxiv, DOI [arXiv:2006.05582, DOI 10.48550/ARXIV.2006.05582]
  • [12] FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
    Ilg, Eddy
    Mayer, Nikolaus
    Saikia, Tonmoy
    Keuper, Margret
    Dosovitskiy, Alexey
    Brox, Thomas
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1647 - 1655
  • [13] Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
    Jenni, Simon
    Jin, Hailin
    Favaro, Paolo
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6407 - 6416
  • [14] Momentum Contrast for Unsupervised Visual Representation Learning
    He, Kaiming
    Fan, Haoqi
    Wu, Yuxin
    Xie, Saining
    Girshick, Ross
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9726 - 9735
  • [15] Kay W, 2017, Arxiv, DOI arXiv:1705.06950
  • [16] A New Representation of Skeleton Sequences for 3D Action Recognition
    Ke, Qiuhong
    Bennamoun, Mohammed
    An, Senjian
    Sohel, Ferdous
    Boussaid, Farid
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4570 - 4579
  • [17] Kim D, 2019, AAAI CONF ARTIF INTE, P8545
  • [18] Komodakis N., 2018, INT C LEARNING REPRE
  • [19] Kun Su, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P9628, DOI 10.1109/CVPR42600.2020.00965
  • [20] Learning Representations for Automatic Colorization
    Larsson, Gustav
    Maire, Michael
    Shakhnarovich, Gregory
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 577 - 593