Towards Reversal-Invariant Image Representation

被引：0

作者：

Lingxi Xie

Jingdong Wang

Weiyao Lin

Bo Zhang

Qi Tian

机构：

[1] The Johns Hopkins University,

[2] Microsoft Research,undefined

[3] Shanghai Jiao Tong University,undefined

[4] Tsinghua University,undefined

[5] University of Texas at San Antonio,undefined

来源：

International Journal of Computer Vision | 2017年 / 123卷

关键词：

Image classification; BoF; CNN; Reversal-invariant image representation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

State-of-the-art image classification approaches are mainly based on robust image representation, such as the bag-of-features (BoF) model or the convolutional neural network (CNN) architecture. In real applications, the orientation (left/right) of an image or an object might vary from sample to sample, whereas some handcrafted descriptors (e.g., SIFT) and network operations (e.g., convolution) are not reversal-invariant, leading to the unsatisfied stability of image features extracted from these models. To deal with, a popular solution is to augment the dataset by adding a left-right reversed copy for each image. This strategy improves the recognition accuracy to some extent, but also brings the price of almost doubled time and memory consumptions on both the training and testing stages. In this paper, we present an alternative solution based on designing reversal-invariant representation of local patterns, so that we can obtain the identical representation for an image and its left-right reversed copy. For the BoF model, we design a reversal-invariant version of SIFT descriptor named Max-SIFT, a generalized RIDE algorithm which can be applied to a large family of local descriptors. For the CNN architecture, we present a simple idea of generating reversal-invariant deep features (RI-Deep), and, inspired by which, design reversal-invariant convolution (RI-Conv) layers to increase the CNN capacity without increasing the model complexity. Experiments reveal consistent accuracy gain on various image classification tasks, including scene understanding, fine-grained object recognition, and large-scale visual recognition.

引用

页码：226 / 250

页数：24

共 71 条

[1]

Bay H(2008)Speeded-up robust features (SURF) Computer Vision and Image Understanding 110 346-359

[2]

Ess A(2002)Shape matching and object recognition using shape contexts IEEE Transactions on Pattern Analysis and Machine Intelligence 24 509-522

[3]

Tuytelaars T(2008)LIBLINEAR: A library for large linear classification Journal of Machine Learning Research 9 1871-1874

[4]

Van Gool L(2014)Local alignments for fine-grained categorization International Journal on Computer Vision 111 191-212

[5]

Belongie S(2015)Spatial pyramid pooling in deep convolutional networks for visual recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 37 1904-1916

[6]

Malik J(2014)Rotation-invariant HOG descriptors using fourier analysis in polar and spherical coordinates International Journal of Computer Vision 106 342-364

[7]

Puzicha J(2004)Distinctive image features from scale-invariant keypoints International Journal on Computer Vision 60 91-110

[8]

Fan R(2004)Robust wide-baseline stereo from maximally stable extremal regions Image and Vision Computing 22 761-767

[9]

Chang K(2015)ImageNet large scale visual recognition challenge International Journal of Computer Vision 1 1-42

[10]

Hsieh C(2013)Image classification with the Fisher vector: Theory and practice International Journal on Computer Vision 105 222-245

← 1 2 3 4 5 6 7 8 →