Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition

被引:0
|
作者
Bi, Shuai [1 ]
Hu, Zhengping [1 ]
Zhao, Mengyao [1 ]
Zhang, Hehao [1 ]
Di, Jirui [1 ]
Sun, Zhe [1 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, West Hebei St 438, Qinhuangdao 066004, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Self-supervised learning; Pretext task learning; Multi-view contrastive learning; Action recognition;
D O I
10.1007/s11760-023-02605-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Self-supervised video representation learning attempts to extract latent spatiotemporal semantic information from unlabeled data that will be used for downstream visual understanding tasks. However, we found that in mainstream video datasets, the same actions may be marked as inconsistent categories in different environments. Therefore, it is crucial to concentrate on motion features and background areas while extracting the spatial and temporal characteristics of the video. This paper presents a self-supervised action recognition framework to learn the dynamic-static features of video by combining the pretext task with cross-view contrastive learning. Specifically, we first introduce a video cloze procedure pretext task that exploits temporally strong correlations to obtain prediction categories for further supervised information generation. Next, multi-view contrastive learning is proposed to extract motion characteristics and global semantic information from consecutive video frames. Through joint optimization of the pretext task and multiple contrast losses, our method demonstrates that the recognition accuracy on the UCF101 and HMDB51 datasets is 1.2% and 0.8% higher than the highest accuracy obtained by using residual contrastive and 1.3% and 0.4% higher than that obtained by using RGB contrastive only. Experimental results with different datasets and backbone networks demonstrate that our proposal can significantly increase the generalization and robustness of the model.
引用
收藏
页码:3775 / 3782
页数:8
相关论文
共 50 条
  • [1] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Hehao Zhang
    Jirui Di
    Zhe Sun
    Signal, Image and Video Processing, 2023, 17 : 3775 - 3782
  • [2] Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
    Khaertdinov, Bulat
    Jeuris, Pedro
    Sousa, Annanda
    Hortal, Enrique
    INTERSPEECH 2024, 2024, : 4708 - 4712
  • [3] Multi-View Collaborative Training and Self-Supervised Learning for Group Recommendation
    Wei, Feng
    Chen, Shuyu
    MATHEMATICS, 2025, 13 (01)
  • [4] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [5] Contrastive Self-Supervised Learning for Skeleton Action Recognition
    Gao, Xuehao
    Yang, Yang
    Du, Shaoyi
    NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 51 - 61
  • [6] Self-supervised learning for multi-view stereo
    Ito S.
    Kaneko N.
    Sumi K.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2020, 86 (12): : 1042 - 1050
  • [7] Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation
    Sermanet, Pierre
    Lynch, Corey
    Hsu, Jasmine
    Levine, Sergey
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 486 - 487
  • [8] Self-supervised Metric Learning in Multi-View Data: A Downstream Task Perspective
    Wang, Shulei
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2454 - 2467
  • [9] Multi-View Action Recognition using Contrastive Learning
    Shah, Ketul
    Shah, Anshul
    Lau, Chun Pong
    de Melo, Celso M.
    Chellappa, Rama
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3370 - 3380
  • [10] Part Aware Contrastive Learning for Self-Supervised Action Recognition
    Hua, Yilei
    Wu, Wenhan
    Zheng, Ce
    Lu, Aidong
    Liu, Mengyuan
    Chen, Chen
    Wu, Shiqian
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 855 - 863