A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications

被引:17
作者
Anuranji, R. [1 ]
Srimathi, H. [1 ]
机构
[1] SRM Inst Sci & Technol, Comp Sci Engn, Chennai 603203, Tamil Nadu, India
关键词
Deep learning; Hashing; Video event retrieval; Temporal hashing; LSTM; Scalable video search; QUANTIZATION; IMAGE;
D O I
10.1016/j.dsp.2020.102729
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, large scale video content retrieval has gained more attention due to a large amount of user-generated images and video content available over the internet. Hashing is one of the effective techniques that encode the high dimensional features vectors into compact binary codes. The aim of hashing is to generate the short binary codes and map the similar hash code values to retrieve similar video from the database with minimum distance measure. Deep learning-based hashing networks are employed to learn the representative video feature to estimate the hash functions. However, the existing hashing approaches fail due to the frame-level feature representation and does not well exploits the effective temporal features in visual search. Furthermore, significant loss of features during the dimensionality reduction step causes low accuracy. Hence, it is essential to develop a deep learning-based hashing framework that should exploits both the strong spatial and temporal features for the scalable video search. The main objective is to learn the high dimensional features from the entire video and derive compact binary codes to retrieve the similar videos for the input query sequence. In this paper, we propose a joint network model of supervised stacked heterogeneous convolutional multi-kernel (Stacked HetConvMK)-bidirectional Long Short Term Memory (BiDLSTM) network model that effectively encodes the rich structural as well as the discriminative features from the video sequence to estimate the compact binary codes. Initially, the video frames are passed to the stacked convolution networks with heterogeneous convolutional kernel size and residual learning to extract the spatial features at different views from the video sequences and to improve the learning efficiency. Then, the bidirectional network computes the sequence in both forward and backward directions and obtains the series of hidden state output. Finally, the fully connected structure with an activation unit performs hashing to learn the multiple codes for each video. Experimental analysis is performed on three datasets and the result shows a better accuracy measure than the other state-of-art approaches. (C) 2020 Published by Elsevier Inc.
引用
收藏
页数:12
相关论文
共 45 条
[21]   Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing [J].
Lin, Zijia ;
Ding, Guiguang ;
Han, Jungong ;
Wang, Jianmin .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (12) :4342-4355
[22]   Deep Video Hashing [J].
Liong, Venice Erin ;
Lu, Jiwen ;
Tan, Yap-Peng ;
Zhou, Jie .
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (06) :1209-1219
[23]  
Liong VE, 2015, PROC CVPR IEEE, P2475, DOI 10.1109/CVPR.2015.7298862
[24]   Near-Duplicate Video Retrieval: Current Research and Future Trends [J].
Liu, Jiajun ;
Huang, Zi ;
Cai, Hongyun ;
Shen, Heng Tao ;
Chong Wah Ngo ;
Wang, Wei .
ACM COMPUTING SURVEYS, 2013, 45 (04)
[25]  
Liu W, 2012, PROC CVPR IEEE, P2074, DOI 10.1109/CVPR.2012.6247912
[26]  
Natarajan P, 2012, PROC CVPR IEEE, P1298, DOI 10.1109/CVPR.2012.6247814
[27]   Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning [J].
Pan, Pingbo ;
Xu, Zhongwen ;
Yang, Yi ;
Wu, Fei ;
Zhuang, Yueting .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1029-1038
[28]  
Shang Lifeng., 2010, P 18 ACM INT C MULT, P531
[29]  
Shen FM, 2015, PROC CVPR IEEE, P37, DOI 10.1109/CVPR.2015.7298598
[30]   Video Retrieval with Similarity-Preserving Deep Temporal Hashing [J].
Shen, Ling ;
Hong, Richang ;
Zhang, Haoran ;
Tian, Xinmei ;
Wang, Meng .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (04)