Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder

被引:210
作者
Song, Jingkuan [1 ]
Zhang, Hanwang [2 ]
Li, Xiangpeng [1 ]
Gao, Lianli [1 ]
Wang, Meng [3 ]
Hong, Richang [3 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
[2] Nanyang Technol Univ, Singapore 639798, Singapore
[3] Hefei Univ Technol, Hefei 230009, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Video hashing; video retrieval; self-supervised; binary LSTM; neighbor model; ACTION RECOGNITION; QUANTIZATION; IMAGE;
D O I
10.1109/TIP.2018.2814344
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval.
引用
收藏
页码:3210 / 3221
页数:12
相关论文
共 47 条
[1]  
[Anonymous], 2014, Advances in neural information processing systems
[2]  
[Anonymous], 2010, P PYTHON SCI COMPUTI
[3]  
[Anonymous], 2018, P AAAI
[4]  
[Anonymous], 1997, Neural Computation
[5]  
[Anonymous], 2016, ARXIV PREPRINT ARXIV
[6]  
[Anonymous], 2011, INT C MULT, DOI DOI 10.1145/2072298.2072354
[7]  
[Anonymous], 2017, ADVERSARIAL LEARNING
[8]  
Britz D., 2017, Massive Exploration of Neural Machine Translation Architectures
[9]  
Cao Liangliang., 2012, ACM Multimedia, P299
[10]   HashNet: Deep Learning to Hash by Continuation [J].
Cao, Zhangjie ;
Long, Mingsheng ;
Wang, Jianmin ;
Yu, Philip S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5609-5618