Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

被引:0
|
作者
Huo, Yuqi [1 ,2 ]
Ding, Mingyu [3 ]
Lu, Haoyu [1 ]
Huang, Ziyuan [4 ]
Tang, Mingqian [5 ]
Lu, Zhiwu [2 ]
Xiang, Tao [6 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Natl Univ Singapore, Singapore, Singapore
[5] Alibaba Grp, Hangzhou, Peoples R China
[6] Univ Surrey, Surrey, England
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated by the fact that videos are spatiotemporal by nature and a representation learned by detecting spatiotemporal continuity/discontinuity is thus beneficial for downstream video content analysis tasks. A natural choice of such a pretext task is to construct spatiotemporal (3D) jigsaw puzzles and learn to solve them. However, as we demonstrate in the experiments, this task turns out to be intractable. We thus propose Constrained Spatiotemporal Jigsaw (CSJ) whereby the 3D jigsaws are formed in a constrained manner to ensure that large continuous spatiotemporal cuboids exist. This provides sufficient cues for the model to reason about the continuity. Instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable. The four tasks aim to learn representations sensitive to spatiotemporal continuity at both the local and global levels. Extensive experiments show that our CSJ achieves state-of-the-art on various benchmarks.
引用
收藏
页码:751 / 757
页数:7
相关论文
共 50 条
  • [31] TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning
    Liu, Yang
    Wang, Keze
    Liu, Lingbo
    Lan, Haoyuan
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1978 - 1993
  • [32] Dynamic-boosting attention for self-supervised video representation learning
    Zhipeng Wang
    Chunping Hou
    Guanghui Yue
    Qingyuan Yang
    Applied Intelligence, 2022, 52 : 3143 - 3155
  • [33] Self-Supervised Video Representation Learning via Latent Time Navigation
    Yang, Di
    Wang, Yaohui
    Kong, Quan
    Dantcheva, Antitza
    Garattoni, Lorenzo
    Francesca, Gianpiero
    Bremond, Francois
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3118 - 3126
  • [34] Self-Supervised Video Representation Learning by Serial Restoration With Elastic Complexity
    Chen, Ziyu
    Wang, Hanli
    Chen, Chang Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2235 - 2248
  • [35] Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage
    Liu, Yinan
    Wu, Qingbo
    Tang, Liangzhi
    Xu, Linfeng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (05): : 1449 - 1452
  • [36] Self-Supervised Video Representation Learning with Meta-Contrastive Network
    Lin, Yuanze
    Guo, Xun
    Lu, Yan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8219 - 8229
  • [37] Dynamic-boosting attention for self-supervised video representation learning
    Wang, Zhipeng
    Hou, Chunping
    Yue, Guanghui
    Yang, Qingyuan
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3143 - 3155
  • [38] Whitening for Self-Supervised Representation Learning
    Ermolov, Aleksandr
    Siarohin, Aliaksandr
    Sangineto, Enver
    Sebe, Nicu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [39] Self-Supervised Representation Learning for CAD
    Jones, Benjamin T.
    Hu, Michael
    Kodnongbua, Milin
    Kim, Vladimir G.
    Schulz, Adriana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21327 - 21336
  • [40] Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
    Qing, Zhiwu
    Zhang, Shiwei
    Huang, Ziyuan
    Xu, Yi
    Wang, Xiang
    Tang, Mingqian
    Gao, Changxin
    Jin, Rong
    Sang, Nong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13811 - 13821