Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition

被引:0
|
作者
Bi, Shuai [1 ]
Hu, Zhengping [1 ]
Zhao, Mengyao [1 ]
Zhang, Hehao [1 ]
Di, Jirui [1 ]
Sun, Zhe [1 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, West Hebei St 438, Qinhuangdao 066004, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Self-supervised learning; Pretext task learning; Multi-view contrastive learning; Action recognition;
D O I
10.1007/s11760-023-02605-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Self-supervised video representation learning attempts to extract latent spatiotemporal semantic information from unlabeled data that will be used for downstream visual understanding tasks. However, we found that in mainstream video datasets, the same actions may be marked as inconsistent categories in different environments. Therefore, it is crucial to concentrate on motion features and background areas while extracting the spatial and temporal characteristics of the video. This paper presents a self-supervised action recognition framework to learn the dynamic-static features of video by combining the pretext task with cross-view contrastive learning. Specifically, we first introduce a video cloze procedure pretext task that exploits temporally strong correlations to obtain prediction categories for further supervised information generation. Next, multi-view contrastive learning is proposed to extract motion characteristics and global semantic information from consecutive video frames. Through joint optimization of the pretext task and multiple contrast losses, our method demonstrates that the recognition accuracy on the UCF101 and HMDB51 datasets is 1.2% and 0.8% higher than the highest accuracy obtained by using residual contrastive and 1.3% and 0.4% higher than that obtained by using RGB contrastive only. Experimental results with different datasets and backbone networks demonstrate that our proposal can significantly increase the generalization and robustness of the model.
引用
收藏
页码:3775 / 3782
页数:8
相关论文
共 50 条
  • [41] MSGCL: inferring miRNA-disease associations based on multi-view self-supervised graph structure contrastive learning
    Ruan, Xinru
    Jiang, Changzhi
    Lin, Peixuan
    Lin, Yuan
    Liu, Juan
    Huang, Shaohui
    Liu, Xiangrong
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (02)
  • [42] Digging into Uncertainty in Self-supervised Multi-view Stereo
    Xu, Hongbin
    Zhou, Zhipeng
    Wang, Yali
    Kang, Wenxiong
    Sun, Baigui
    Li, Hao
    Qiao, Yu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6058 - 6067
  • [43] Self-supervised Deep Correlational Multi-view Clustering
    Xin, Bowen
    Zeng, Shan
    Wang, Xiuying
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
    Lin, Lilang
    Zhang, Jiahang
    Liu, Jiaying
    Proceedings - IEEE International Symposium on Circuits and Systems, 2023, 2023-May
  • [45] Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-Supervised Action Recognition
    Guo, Tianyu
    Liu, Hong
    Chen, Zhan
    Liu, Mengyuan
    Wang, Tao
    Ding, Runwei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 762 - 770
  • [46] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
    Lin, Lilang
    Zhang, Jiahang
    Liu, Jiaying
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [47] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
    Hu, Jinhua
    Hou, Yonghong
    Guo, Zihui
    Gao, Jiajun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
  • [48] Self-supervised Graph Contrastive Learning for Video Question Answering
    Yao X.
    Gao J.-Y.
    Xu C.-S.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2083 - 2100
  • [49] Motion Sensitive Contrastive Learning for Self-supervised Video Representation
    Ni, Jingcheng
    Zhou, Nan
    Qin, Jie
    Wu, Qian
    Liu, Junqi
    Li, Boxun
    Huang, Di
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 457 - 474
  • [50] Similarity contrastive estimation for image and video soft contrastive self-supervised learning
    Denize, Julien
    Rabarisoa, Jaonary
    Orcesi, Astrid
    Herault, Romain
    MACHINE VISION AND APPLICATIONS, 2023, 34 (06)