Self-supervised Video Hashing via Bidirectional Transformers

被引:30
作者
Li, Shuyan [1 ,3 ]
Li, Xiu [1 ,3 ]
Lu, Jiwen [1 ,2 ]
Zhou, Jie [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Grad Sch Shenzhen, Beijing, Peoples R China
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
中国国家自然科学基金;
关键词
QUANTIZATION;
D O I
10.1109/CVPR46437.2021.01334
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing unsupervised video hashing methods are built on unidirectional models with less reliable training objectives, which underuse the correlations among frames and the similarity structure between videos. To enable efficient scalable video retrieval, we propose a self-supervised video Hashing method based on Bidirectional Transformers (BTH). Based on the encoder-decoder structure of transformers, we design a visual cloze task to fully exploit the bidirectional correlations between frames. To unveil the similarity structure between unlabeled video data, we further develop a similarity reconstruction task by establishing reliable and effective similarity connections in the video space. Furthermore, we develop a cluster assignment task to exploit the structural statistics of the whole dataset such that more discriminative binary codes can be learned. Extensive experiments implemented on three public benchmark datasets, FCVID, ActivityNet and YFCC, demonstrate the superiority of our proposed approach.
引用
收藏
页码:13544 / 13553
页数:10
相关论文
共 49 条
[1]  
[Anonymous], 2009, NEURIPS, P1753
[2]  
[Anonymous], 2011, ICML, DOI DOI 10.1351/PAC-CON-10-10-04
[3]  
[Anonymous], 2017, TMM, DOI DOI 10.1109/TMM.2016.2645404
[4]  
[Anonymous], 2016, TIP, DOI DOI 10.1109/TIP.2016.2593344
[5]  
[Anonymous], 2019, TIP, DOI DOI 10.1109/TIP.2018.2882155
[6]  
[Anonymous], 2018, TIP, DOI DOI 10.1109/TIP.2018.2814344
[7]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[8]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[9]   Deep Hashing via Discrepancy Minimization [J].
Chen, Zhixiang ;
Yuan, Xin ;
Lu, Jiwen ;
Tian, Qi ;
Zhou, Jie .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6838-6847
[10]   Nonlinear Structural Hashing for Scalable Video Search [J].
Chen, Zhixiang ;
Lu, Jiwen ;
Feng, Jianjiang ;
Zhou, Jie .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (06) :1421-1433