Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition

被引：0

作者：

Bi, Shuai ^{[1
]}

Hu, Zhengping ^{[1
]}

Zhao, Mengyao ^{[1
]}

Zhang, Hehao ^{[1
]}

Di, Jirui ^{[1
]}

Sun, Zhe ^{[1
]}

机构：

[1] Yanshan Univ, Sch Informat Sci & Engn, West Hebei St 438, Qinhuangdao 066004, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2023年 / 17卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Unsupervised learning; Self-supervised learning; Pretext task learning; Multi-view contrastive learning; Action recognition;

D O I：

10.1007/s11760-023-02605-z

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Self-supervised video representation learning attempts to extract latent spatiotemporal semantic information from unlabeled data that will be used for downstream visual understanding tasks. However, we found that in mainstream video datasets, the same actions may be marked as inconsistent categories in different environments. Therefore, it is crucial to concentrate on motion features and background areas while extracting the spatial and temporal characteristics of the video. This paper presents a self-supervised action recognition framework to learn the dynamic-static features of video by combining the pretext task with cross-view contrastive learning. Specifically, we first introduce a video cloze procedure pretext task that exploits temporally strong correlations to obtain prediction categories for further supervised information generation. Next, multi-view contrastive learning is proposed to extract motion characteristics and global semantic information from consecutive video frames. Through joint optimization of the pretext task and multiple contrast losses, our method demonstrates that the recognition accuracy on the UCF101 and HMDB51 datasets is 1.2% and 0.8% higher than the highest accuracy obtained by using residual contrastive and 1.3% and 0.4% higher than that obtained by using RGB contrastive only. Experimental results with different datasets and backbone networks demonstrate that our proposal can significantly increase the generalization and robustness of the model.

引用

页码：3775 / 3782

页数：8

共 50 条

[1] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition
Shuai Bi
Zhengping Hu
Mengyao Zhao
Hehao Zhang
Jirui Di
Zhe Sun
Signal, Image and Video Processing, 2023, 17 : 3775 - 3782
[2] Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
Khaertdinov, Bulat
Jeuris, Pedro
Sousa, Annanda
Hortal, Enrique
INTERSPEECH 2024, 2024, : 4708 - 4712
[3] Multi-View Collaborative Training and Self-Supervised Learning for Group Recommendation
Wei, Feng
Chen, Shuyu
MATHEMATICS, 2025, 13 (01)
[4] Self-supervised learning for multi-view stereo
Ito S.
Kaneko N.
Sumi K.
Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2020, 86 (12): : 1042 - 1050
[5] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
Zhang, Jie
Wan, Zhifan
Hu, Lanqing
Lin, Stephen
Wu, Shuzhe
Shan, Shiguang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
[6] Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
Schreyer, Marco
Sattarov, Timur
Borth, Damian
ICAIF 2021: THE SECOND ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, 2021,
[7] Sleep Stage Classification Via Multi-View Based Self-Supervised Contrastive Learning of EEG
Zhao, Chen
Wu, Wei
Zhang, Haoyi
Zhang, Ruiyan
Zheng, Xinyue
Kong, Xiangzeng
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (12) : 7068 - 7077
[8] Multi-task self-supervised learning based fusion representation for Multi-view clustering
Guo, Tianlong
Shen, Derong
Kou, Yue
Nie, Tiezheng
INFORMATION SCIENCES, 2025, 694
[9] Self-Supervised Discriminative Feature Learning for Deep Multi-View Clustering
Xu, Jie
Ren, Yazhou
Tang, Huayi
Yang, Zhimeng
Pan, Lili
Yang, Yang
Pu, Xiaorong
Yu, Philip S.
He, Lifang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7470 - 7482
[10] SELF-SUPERVISED CONTRASTIVE LEARNING FOR AUDIO-VISUAL ACTION RECOGNITION
Liu, Yang
Tan, Ying
Lan, Haoyuan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1000 - 1004

← 1 2 3 4 5 →