Self-Supervised Video Pose Representation Learning for Occlusion-Robust Action Recognition

被引:0
|
作者
Yang, Di [1 ,2 ]
Wang, Yaohui [1 ,2 ]
Dantcheva, Antitza [1 ,2 ]
Garattoni, Lorenzo [3 ]
Francesca, Gianpiero [3 ]
Bremond, Francois [1 ,2 ]
机构
[1] INRIA, Le Chesnay, France
[2] Univ Cote dAzur, Nice, France
[3] Toyota Motor Europe, Brussels, Belgium
来源
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021) | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition based on human pose has witnessed increasing attention due to its robustness to changes in appearances, environments, and view-points. Despite associated progress, one remaining challenge has to do with occlusion in real-world videos that hinders the visibility of all joints. Such occlusion impedes representation of such scenes by models that have been trained on full-body pose data, obtained in laboratory conditions with specific sensors. To address this, as a first contribution, we introduce OR-VPE, a novel video pose embedding network that is streamlined to learn an occlusion-robust representation for pose sequences in videos. In order to enable our embedding network to handle partially visible joints, we propose to incorporate a sub-graph data augmentation mechanism during training, which simulates occlusions, into a video pose encoder based on Graph Convolutional Networks (GCNs). As a second contribution, we apply a contrastive learning module to train the video pose representation in a self-supervised manner without the necessity of action annotations. This is achieved by maximizing the mutual information of the same pose sequence pruned into different spatio-temporal subgraphs. Experimental analyses show that compared to training the same encoder from scratch, our proposed OR-VPE, with pre-training on a large-scale dataset, NTU-RGB+D 120, improves the performance of the downstream action classification on Toyota Smarthome, N-UCLA and Penn Action datasets.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
    Zhang, Jie
    Wan, Zhifan
    Hu, Lanqing
    Lin, Stephen
    Wu, Shuzhe
    Shan, Shiguang
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
  • [2] Self-Supervised Learning for Action Recognition by Video Denoising
    Thi Thu Trang Phung
    Thi Hong Thu Ma
    Van Truong Nguyen
    Duc Quang Vu
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81
  • [3] Self-Supervised EEG Representation Learning for Robust Emotion Recognition
    Liu, Huan
    Zhang, Yuzhe
    Chen, Xuxu
    Zhang, Dalin
    Li, Rui
    Qin, Tao
    ACM TRANSACTIONS ON SENSOR NETWORKS, 2024, 20 (05)
  • [4] Occlusion-Robust Object Pose Estimation with Holistic Representation
    Chen, Bo
    Chin, Tat-Jun
    Klimavicius, Marius
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2223 - 2233
  • [5] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Li, Shufang
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1485 - 1492
  • [6] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
    Jiao, Jianbo
    Droste, Richard
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
  • [7] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Shufang Li
    Zhe Sun
    Signal, Image and Video Processing, 2023, 17 : 1485 - 1492
  • [8] Self-supervised learning for robust video indexing
    Ewerth, Ralph
    Freisleben, Bernd
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1749 - +
  • [9] Occlusion-Robust Model Learning for Human Pose Estimation
    Kawana, Yuki
    Ukita, Norimichi
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 494 - 498
  • [10] Self-Supervised Video Representation Learning by Video Incoherence Detection
    Cao, Haozhi
    Xu, Yuecong
    Mao, Kezhi
    Xie, Lihua
    Yin, Jianxiong
    See, Simon
    Xu, Qianwen
    Yang, Jianfei
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822