Self-Supervised Video Pose Representation Learning for Occlusion-Robust Action Recognition

被引：0

作者：

Yang, Di ^{[1
,2
]}

Wang, Yaohui ^{[1
,2
]}

Dantcheva, Antitza ^{[1
,2
]}

Garattoni, Lorenzo ^{[3
]}

Francesca, Gianpiero ^{[3
]}

Bremond, Francois ^{[1
,2
]}

机构：

[1] INRIA, Le Chesnay, France

[2] Univ Cote dAzur, Nice, France

[3] Toyota Motor Europe, Brussels, Belgium

来源：

2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action recognition based on human pose has witnessed increasing attention due to its robustness to changes in appearances, environments, and view-points. Despite associated progress, one remaining challenge has to do with occlusion in real-world videos that hinders the visibility of all joints. Such occlusion impedes representation of such scenes by models that have been trained on full-body pose data, obtained in laboratory conditions with specific sensors. To address this, as a first contribution, we introduce OR-VPE, a novel video pose embedding network that is streamlined to learn an occlusion-robust representation for pose sequences in videos. In order to enable our embedding network to handle partially visible joints, we propose to incorporate a sub-graph data augmentation mechanism during training, which simulates occlusions, into a video pose encoder based on Graph Convolutional Networks (GCNs). As a second contribution, we apply a contrastive learning module to train the video pose representation in a self-supervised manner without the necessity of action annotations. This is achieved by maximizing the mutual information of the same pose sequence pruned into different spatio-temporal subgraphs. Experimental analyses show that compared to training the same encoder from scratch, our proposed OR-VPE, with pre-training on a large-scale dataset, NTU-RGB+D 120, improves the performance of the downstream action classification on Toyota Smarthome, N-UCLA and Penn Action datasets.

引用

页数：5

共 50 条

[1] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
Zhang, Jie
Wan, Zhifan
Hu, Lanqing
Lin, Stephen
Wu, Shuzhe
Shan, Shiguang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
[2] Self-Supervised Learning for Action Recognition by Video Denoising
Thi Thu Trang Phung
Thi Hong Thu Ma
Van Truong Nguyen
Duc Quang Vu
2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81
[3] Self-Supervised EEG Representation Learning for Robust Emotion Recognition
Liu, Huan
Zhang, Yuzhe
Chen, Xuxu
Zhang, Dalin
Li, Rui
Qin, Tao
ACM TRANSACTIONS ON SENSOR NETWORKS, 2024, 20 (05)
[4] Occlusion-Robust Object Pose Estimation with Holistic Representation
Chen, Bo
Chin, Tat-Jun
Klimavicius, Marius
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2223 - 2233
[5] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
Bi, Shuai
Hu, Zhengping
Zhao, Mengyao
Li, Shufang
Sun, Zhe
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1485 - 1492
[6] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
Jiao, Jianbo
Droste, Richard
Drukker, Lior
Papageorghiou, Aris T.
Noble, J. Alison
2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
[7] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
Shuai Bi
Zhengping Hu
Mengyao Zhao
Shufang Li
Zhe Sun
Signal, Image and Video Processing, 2023, 17 : 1485 - 1492
[8] Self-supervised learning for robust video indexing
Ewerth, Ralph
Freisleben, Bernd
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1749 - +
[9] Occlusion-Robust Model Learning for Human Pose Estimation
Kawana, Yuki
Ukita, Norimichi
PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 494 - 498
[10] Self-Supervised Video Representation Learning by Video Incoherence Detection
Cao, Haozhi
Xu, Yuecong
Mao, Kezhi
Xie, Lihua
Yin, Jianxiong
See, Simon
Xu, Qianwen
Yang, Jianfei
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822

← 1 2 3 4 5 →