Contrastive Masked Autoencoders for Self-Supervised Video Hashing

被引:0
|
作者
Wang, Yuting [1 ,3 ]
Wang, Jinpeng [1 ,3 ]
Chen, Bin [2 ]
Zeng, Ziyun [1 ,3 ]
Xia, Shu-Tao [1 ,3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Peoples R China
[3] Peng Cheng Lab, Res Ctr Artificial Intelligence, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-Supervised Video Hashing (SSVH) models learn to generate short binary representations for videos without ground-truth supervision, facilitating large-scale video retrieval efficiency and attracting increasing research attention. The success of SSVH lies in the understanding of video content and the ability to capture the semantic relation among unlabeled videos. Typically, state-of-the-art SSVH methods consider these two points in a two-stage training pipeline, where they firstly train an auxiliary network by instance-wise mask-and-predict tasks and secondly train a hashing model to preserve the pseudo-neighborhood structure transferred from the auxiliary network. This consecutive training strategy is inflexible and also unnecessary. In this paper, we propose a simple yet effective one-stage SSVH method called ConMH, which incorporates video semantic information and video similarity relationship understanding in a single stage. To capture video semantic information, we adopt an encoder-decoder structure to reconstruct the video from its temporal-masked frames. Particularly, we find that a higher masking ratio helps video understanding. Besides, we fully exploit the similarity relationship between videos by maximizing agreement between two augmented views of a video, which contributes to more discriminative and robust hash codes. Extensive experiments on three large-scale video datasets (i.e., FCVID, ActivityNet and YFCC) indicate that ConMH achieves state-of-the-art results. Code is available at https://github.com/ huangmozhi9527/ConMH.
引用
收藏
页码:2733 / 2741
页数:9
相关论文
共 50 条
  • [1] ViC-MAE: Self-supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
    Hernandez, Jefferson
    Villegas, Ruben
    Ordonez, Vicente
    COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 : 444 - 463
  • [2] GraphMAE: Self-Supervised Masked Graph Autoencoders
    Hou, Zhenyu
    Liu, Xiao
    Cen, Yukuo
    Dong, Yuxiao
    Yang, Hongxia
    Wang, Chunjie
    Tang, Jie
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 594 - 604
  • [3] Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing
    Nanculef, Ricardo
    Mena, Francisco
    Macaluso, Antonio
    Lodi, Stefano
    Sartori, Claudio
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 : 258 - 268
  • [4] Masked Autoencoders for Point Cloud Self-supervised Learning
    Pang, Yatian
    Wang, Wenxiao
    Tay, Francis E. H.
    Liu, Wei
    Tian, Yonghong
    Yuan, Li
    COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 604 - 621
  • [5] Contrastive Self-Supervised Hashing With Dual Pseudo Agreement
    Li, Yang
    Wang, Yapeng
    Miao, Zhuang
    Wang, Jiabao
    Zhang, Rui
    IEEE ACCESS, 2020, 8 : 165034 - 165043
  • [6] Contrastive Self-Supervised Learning as a Strong Baseline for Unsupervised Hashing
    Yang, Huei-Fang
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [7] CMAE-3D: Contrastive Masked AutoEncoders for Self-Supervised 3D Object Detection
    Zhang, Yanan
    Chen, Jiaxin
    Huang, Di
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 2783 - 2804
  • [8] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
    Tong, Zhan
    Song, Yibing
    Wang, Jue
    Wang, Limin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Self-Supervised Temporal Sensitive Hashing for Video Retrieval
    Li, Qihua
    Tian, Xing
    Ng, Wing W. Y.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9021 - 9035
  • [10] Self-supervised Video Hashing via Bidirectional Transformers
    Li, Shuyan
    Li, Xiu
    Lu, Jiwen
    Zhou, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13544 - 13553