Global-local contrastive multiview representation learning for skeleton-based action

被引：5

作者：

Bian, Cunling ^{[1
]}

Feng, Wei ^{[1
]}

Meng, Fanbo ^{[2
]}

Wang, Song ^{[3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China

[2] Tianjin Univ, Inst Int Engn, Tianjin 300350, Peoples R China

[3] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2023年 / 229卷

基金：

中国国家自然科学基金;

关键词：

Skeleton-based action recognition; Contrastive representation learning; Multiview; Graph convolutional network; DEEPER;

D O I：

10.1016/j.cviu.2023.103655

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Skeleton-based human action recognition has been drawing more interest recently due to its low sensitivity to appearance changes and the accessibility of more skeleton data. However, the skeletons captured in practice are sensitive to the view of an actor, given the occlusion of different human-body joints and the errors in human joint localization. Each view is noisy and incomplete, but important factors, such as motion and semantics, should be shared between all views in action representation learning. We support the classic hypothesis that a powerful representation is one that models view-invariant factors, and so does unsupervised learning. Therefore, we study this hypothesis under the framework of contrastive multiview learning, where we learn a representation for action recognition that aims to maximize the mutual information between different views of the same action sequence. Apart from that, a global-local contrastive loss is proposed to model the multi-scale co-occurrence relationships in both spatial and temporal domains. Extensive experimental results show that the proposed method significantly boosts the performance of unsupervised skeleton-based human action methods on three challenging benchmarks of PKUMMD, NTU RGB+D 60, and NTU RGB+D 120.

引用

页数：10

共 56 条

[1] Balanced graph partitioning
Andreev, Konstantin
Raecke, Harald
[J]. THEORY OF COMPUTING SYSTEMS, 2006, 39 (06) : 929 - 939
[2] [Anonymous], 2012, View Invariant Human Action Recognition Using Histograms of 3D Joints
[3] [Anonymous], 2018, ARXIV180910341
[4] Bachman P, 2019, Arxiv, DOI arXiv:1906.00910
[5] Efficient Video Classification Using Fewer Frames
Bhardwaj, Shweta
Srinivasan, Mukundhan
Khapra, Mitesh M.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 354 - 363
[6] Structural Knowledge Distillation for Efficient Skeleton-Based Action Recognition
Bian, Cunling
Feng, Wei
Wan, Liang
Wang, Song
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2963 - 2976
[7] Balanced Graph Edge Partition
Bourse, Florian
Lelarge, Marc
Vojnovic, Milan
[J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1456 - 1465
[8] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Cao, Zhe
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
[9] Chen T, 2020, Arxiv, DOI arXiv:2002.05709
[10] Skeleton-Based Action Recognition with Shift Graph Convolutional Network
Cheng, Ke
Zhang, Yifan
He, Xiangyu
Chen, Weihan
Cheng, Jian
Lu, Hanqing
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 180 - 189

← 1 2 3 4 5 6 →