A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications

被引:17
作者
Anuranji, R. [1 ]
Srimathi, H. [1 ]
机构
[1] SRM Inst Sci & Technol, Comp Sci Engn, Chennai 603203, Tamil Nadu, India
关键词
Deep learning; Hashing; Video event retrieval; Temporal hashing; LSTM; Scalable video search; QUANTIZATION; IMAGE;
D O I
10.1016/j.dsp.2020.102729
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, large scale video content retrieval has gained more attention due to a large amount of user-generated images and video content available over the internet. Hashing is one of the effective techniques that encode the high dimensional features vectors into compact binary codes. The aim of hashing is to generate the short binary codes and map the similar hash code values to retrieve similar video from the database with minimum distance measure. Deep learning-based hashing networks are employed to learn the representative video feature to estimate the hash functions. However, the existing hashing approaches fail due to the frame-level feature representation and does not well exploits the effective temporal features in visual search. Furthermore, significant loss of features during the dimensionality reduction step causes low accuracy. Hence, it is essential to develop a deep learning-based hashing framework that should exploits both the strong spatial and temporal features for the scalable video search. The main objective is to learn the high dimensional features from the entire video and derive compact binary codes to retrieve the similar videos for the input query sequence. In this paper, we propose a joint network model of supervised stacked heterogeneous convolutional multi-kernel (Stacked HetConvMK)-bidirectional Long Short Term Memory (BiDLSTM) network model that effectively encodes the rich structural as well as the discriminative features from the video sequence to estimate the compact binary codes. Initially, the video frames are passed to the stacked convolution networks with heterogeneous convolutional kernel size and residual learning to extract the spatial features at different views from the video sequences and to improve the learning efficiency. Then, the bidirectional network computes the sequence in both forward and backward directions and obtains the series of hidden state output. Finally, the fully connected structure with an activation unit performs hashing to learn the multiple codes for each video. Experimental analysis is performed on three datasets and the result shows a better accuracy measure than the other state-of-art approaches. (C) 2020 Published by Elsevier Inc.
引用
收藏
页数:12
相关论文
共 45 条
[1]  
[Anonymous], 2015, P CVPR BOST MA US
[2]  
[Anonymous], 2016, P ACM INT C MULT
[3]   Collaborative multiview hashing [J].
Chen, Zhixiang ;
Zhou, Jie .
PATTERN RECOGNITION, 2018, 75 :149-160
[4]   Nonlinear Structural Hashing for Scalable Video Search [J].
Chen, Zhixiang ;
Lu, Jiwen ;
Feng, Jianjiang ;
Zhou, Jie .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (06) :1421-1433
[5]   Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale Videos [J].
Chou, Chien-Li ;
Chen, Hua-Tsung ;
Lee, Suh-Yin .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (03) :382-395
[6]   Video shot detection and condensed representation [J].
Cotsaces, C ;
Nikolaidis, N ;
Pitas, I .
IEEE SIGNAL PROCESSING MAGAZINE, 2006, 23 (02) :28-37
[7]   Two-frame motion estimation based on polynomial expansion [J].
Farnebäck, G .
IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 :363-370
[8]   Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval [J].
Gong, Yunchao ;
Lazebnik, Svetlana ;
Gordo, Albert ;
Perronnin, Florent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (12) :2916-2929
[9]   Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension [J].
Hao, Yanbin ;
Mu, Tingting ;
Goulermas, John Y. ;
Jiang, Jianguo ;
Hong, Richang ;
Wang, Meng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) :5531-5544
[10]   A Survey on Visual Content-Based Video Indexing and Retrieval [J].
Hu, Weiming ;
Xie, Nianhua ;
Li, Li ;
Zeng, Xianglin ;
Maybank, Stephen .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2011, 41 (06) :797-819