Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition

被引:0
|
作者
Bi, Shuai [1 ]
Hu, Zhengping [1 ]
Zhao, Mengyao [1 ]
Zhang, Hehao [1 ]
Di, Jirui [1 ]
Sun, Zhe [1 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, West Hebei St 438, Qinhuangdao 066004, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Self-supervised learning; Pretext task learning; Multi-view contrastive learning; Action recognition;
D O I
10.1007/s11760-023-02605-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Self-supervised video representation learning attempts to extract latent spatiotemporal semantic information from unlabeled data that will be used for downstream visual understanding tasks. However, we found that in mainstream video datasets, the same actions may be marked as inconsistent categories in different environments. Therefore, it is crucial to concentrate on motion features and background areas while extracting the spatial and temporal characteristics of the video. This paper presents a self-supervised action recognition framework to learn the dynamic-static features of video by combining the pretext task with cross-view contrastive learning. Specifically, we first introduce a video cloze procedure pretext task that exploits temporally strong correlations to obtain prediction categories for further supervised information generation. Next, multi-view contrastive learning is proposed to extract motion characteristics and global semantic information from consecutive video frames. Through joint optimization of the pretext task and multiple contrast losses, our method demonstrates that the recognition accuracy on the UCF101 and HMDB51 datasets is 1.2% and 0.8% higher than the highest accuracy obtained by using residual contrastive and 1.3% and 0.4% higher than that obtained by using RGB contrastive only. Experimental results with different datasets and backbone networks demonstrate that our proposal can significantly increase the generalization and robustness of the model.
引用
收藏
页码:3775 / 3782
页数:8
相关论文
共 50 条
  • [21] Multi-view self-supervised learning on heterogeneous graphs for recommendation
    Zhang, Yunjia
    Zhang, Yihao
    Liao, Weiwen
    Li, Xiaokang
    Wang, Xibin
    APPLIED SOFT COMPUTING, 2025, 174
  • [22] MVEB: Self-Supervised Learning With Multi-View Entropy Bottleneck
    Wen, Liangjian
    Wang, Xiasi
    Liu, Jianzhuang
    Xu, Zenglin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6097 - 6108
  • [23] Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation
    Wang, Lulu
    Xu, Zengmin
    Zhang, Xuelian
    Meng, Ruxing
    Lu, Tao
    Computer Engineering and Applications, 2024, 60 (18) : 158 - 166
  • [24] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
    Zhang, Jie
    Wan, Zhifan
    Hu, Lanqing
    Lin, Stephen
    Wu, Shuzhe
    Shan, Shiguang
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
  • [25] Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition
    Men, Qianhui
    Ho, Edmond S. L.
    Shum, Hubert P. H.
    Leung, Howard
    NEUROCOMPUTING, 2023, 537 : 198 - 209
  • [26] Self-Supervised Feature Enhancement: Applying Internal Pretext Task to Supervised Learning
    Xie, Tianshu
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Gong, Haigang
    Liu, Ming
    IEEE ACCESS, 2023, 11 : 1708 - 1717
  • [27] Self-supervised contrastive learning for implicit collaborative filtering
    Song, Shipeng
    Liu, Bin
    Teng, Fei
    Li, Tianrui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [28] Multi-view Self-supervised Learning and Multi-scale Feature Fusion for Automatic Speech Recognition
    Zhao, Jingyu
    Li, Ruwei
    Tian, Maocun
    An, Weidong
    NEURAL PROCESSING LETTERS, 2024, 56 (04)
  • [29] Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging
    Albelwi, Saleh
    ENTROPY, 2022, 24 (04)
  • [30] Multi-view and multi-augmentation for self-supervised visual representation learning
    Tran, Van Nhiem
    Huang, Chi-En
    Liu, Shen-Hsuan
    Aslam, Muhammad Saqlain
    Yang, Kai-Lin
    Li, Yung-Hui
    Wang, Jia-Ching
    APPLIED INTELLIGENCE, 2024, 54 (01) : 629 - 656