Approximate Fisher Kernels of Non-iid Image Models for Image Categorization

被引：21

作者：

Cinbis, Ramazan Gokberk ^{[1
]}

Verbeek, Jakob ^{[2
]}

Schmid, Cordelia ^{[2
]}

机构：

[1] Milsoft, Ankara, Turkey

[2] Univ Grenoble Alpes, CNRS, Lab Jean Kuntzmann, LEAR Team,Inria Grenoble Rhone Alpes, Grenoble, France

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2016年 / 38卷 / 06期

关键词：

Statistical image representations; object recognition; image classification; Fisher kernels; OBJECT CATEGORIZATION; CLASSIFICATION; DESCRIPTORS; VECTOR;

D O I：

10.1109/TPAMI.2015.2484342

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally generate discounting effects in the representations; suggesting that such transformations have proven successful because they closely correspond to the representations obtained for non-iid models. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our experimental evaluation results validate that our models lead to performance improvements comparable to using power normalization, as employed in state-of-the-art feature aggregation methods.

引用

页码：1084 / 1098

页数：15

共 55 条

[1]

[Anonymous], 2008, Advances in neural information processing systems

[2]

[Anonymous], P ADV NEUR INF PROC

[3]

[Anonymous], 2014, Advances in Neural Information Processing Systems

[4]

Bilen H, 2014, P BMVC 2014, P112

[5]

Bishop C., 2006, Pattern recognition and machine learning, P423

[6] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

[7]

Chandalia G., 2006, P NIPS WORKSH NOV AP

[8]

Chappelier JC, 2009, LECT NOTES ARTIF INT, V5781, P195, DOI 10.1007/978-3-642-04180-8_30

[9] The devil is in the details: an evaluation of recent feature encoding methods [J].

Chatfield, Ken ;

Lempitsky, Victor ;

Vedaldi, Andrea ;

Zisserman, Andrew .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,

[10] Segmentation Driven Object Detection with Fisher Vectors [J].

Cinbis, Ramazan Gokberk ;

Verbeek, Jakob ;

Schmid, Cordelia .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2968-2975

← 1 2 3 4 5 6 →