Object-oriented convolutional features for fine-grained image retrieval in large surveillance datasets

被引：24

作者：

Ahmad, Jamil ^{[1
]}

Muhammad, Khan ^{[1
]}

Bakshi, Sambit ^{[2
]}

Baik, Sung Wook ^{[1
]}

机构：

[1] Sejong Univ, Digital Contents Res Inst, Seoul, South Korea

[2] Natl Inst Technol, Dept Comp Sci & Engn, Rourkela, India

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 81卷

基金：

新加坡国家研究基金会;

关键词：

Image retrieval; Object-oriented features; Convolutional neural network; Fine-grained retrieval; REPRESENTATION; QUANTIZATION;

D O I：

10.1016/j.future.2017.11.002

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Large scale visual surveillance generates huge volumes of data at a rapid pace, giving rise to massive image repositories. Efficient and reliable access to relevant data in these ever growing databases is a highly challenging task due to the complex nature of surveillance objects. Furthermore, inter-class visual similarity between vehicles requires extraction of fine-grained and highly discriminative features. In recent years, features from deep convolutional neural networks (CNN) have exhibited state-of-the-art performance in image retrieval. However, these features have been used without regard to their sensitivity to objects of a particular class. In this paper, we propose an object-oriented feature selection mechanism for deep convolutional features from a pre-trained CNN. Convolutional feature maps from a deep layer are selected based on the analysis of their responses to surveillance objects. The selected features serve to represent semantic features of surveillance objects and their parts with minimal influence of the background, effectively eliminating the need for background removal procedure prior to features extraction. Layer-wise mean activations from the selected features maps form the discriminative descriptor for each object. These object-oriented convolutional features (OOCF) are then projected onto low-dimensional hamming space using locality sensitive hashing approaches. The resulting compact binary hash codes allow efficient retrieval within large scale datasets. Results on five challenging datasets reveal that OOCF achieves better precision and recall than the full feature set for objects with varying backgrounds. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：314 / 330

页数：17

共 58 条

[1] Embedded deep vision in smart cameras for multi-view objects representation and retrieval [J].