Towards Reversal-Invariant Image Representation

被引：0

作者：

Lingxi Xie

Jingdong Wang

Weiyao Lin

Bo Zhang

Qi Tian

机构：

[1] The Johns Hopkins University,

[2] Microsoft Research,undefined

[3] Shanghai Jiao Tong University,undefined

[4] Tsinghua University,undefined

[5] University of Texas at San Antonio,undefined

来源：

International Journal of Computer Vision | 2017年 / 123卷

关键词：

Image classification; BoF; CNN; Reversal-invariant image representation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

State-of-the-art image classification approaches are mainly based on robust image representation, such as the bag-of-features (BoF) model or the convolutional neural network (CNN) architecture. In real applications, the orientation (left/right) of an image or an object might vary from sample to sample, whereas some handcrafted descriptors (e.g., SIFT) and network operations (e.g., convolution) are not reversal-invariant, leading to the unsatisfied stability of image features extracted from these models. To deal with, a popular solution is to augment the dataset by adding a left-right reversed copy for each image. This strategy improves the recognition accuracy to some extent, but also brings the price of almost doubled time and memory consumptions on both the training and testing stages. In this paper, we present an alternative solution based on designing reversal-invariant representation of local patterns, so that we can obtain the identical representation for an image and its left-right reversed copy. For the BoF model, we design a reversal-invariant version of SIFT descriptor named Max-SIFT, a generalized RIDE algorithm which can be applied to a large family of local descriptors. For the CNN architecture, we present a simple idea of generating reversal-invariant deep features (RI-Deep), and, inspired by which, design reversal-invariant convolution (RI-Conv) layers to increase the CNN capacity without increasing the model complexity. Experiments reveal consistent accuracy gain on various image classification tasks, including scene understanding, fine-grained object recognition, and large-scale visual recognition.

引用

页码：226 / 250

页数：24

共 71 条

[11]

Wang X(2013)Fast computation of rotation-invariant image features by an approximate radial gradient transform IEEE Transactions on Image Processing 22 2970-2982

[12]

Lin C(2008)80 Million tiny images: A large data set for nonparametric object and scene recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 30 1958-1970

[13]

Gavves E(2010)Dense interest points Computer Vision and Pattern Recognition 32 1582-1596

[14]

Fernando B(2010)Evaluating color descriptors for object and scene recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 32 1582-1596

[15]

Snoek C(2014)Collaborative linear coding for robust image classification International Journal on Computer Vision 1 1-12

[16]

Smeulders A(2014)Spatial pooling of heterogeneous features for image classification IEEE Transactions on Image Processing 23 1994-2008

[17]

Tuytelaars T(2013)Flip-invariant SIFT for copy and object detection IEEE Transactions on Image Processing 22 980-991

[18]

He K(undefined)undefined undefined undefined undefined-undefined

[19]

Zhang X(undefined)undefined undefined undefined undefined-undefined

[20]

Ren S(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 5 6 7 8 →