Improving the Fisher Kernel for Large-Scale Image Classification

被引:1600
作者
Perronnin, Florent
Sanchez, Jorge
Mensink, Thomas
机构
来源
COMPUTER VISION-ECCV 2010, PT IV | 2010年 / 6314卷
关键词
D O I
10.1007/978-3-642-15561-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has riot yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: Image Net and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.
引用
收藏
页码:143 / 156
页数:14
相关论文
共 31 条
[1]  
[Anonymous], 2006, KDD
[2]  
[Anonymous], 2009, ICCV
[3]  
[Anonymous], CVPR
[4]  
[Anonymous], 2009, ICCV
[5]  
[Anonymous], ECCV SLCV WORKSH
[6]  
[Anonymous], 2007, ICML
[7]  
[Anonymous], 2007, CVPR
[8]  
[Anonymous], 2009, ICCV
[9]  
[Anonymous], CVPR
[10]  
[Anonymous], 2006, 2006 IEEE COMP SOC C, DOI DOI 10.1109/CVPR.2006.264