Free-Form Region Description with Second-Order Pooling

被引:51
作者
Carreira, Joao [1 ,2 ]
Caseiro, Rui [2 ]
Batista, Jorge [2 ]
Sminchisescu, Cristian [3 ,4 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[2] Univ Coimbra, Inst Syst & Robot, Coimbra, Portugal
[3] Lund Univ, Dept Math, Fac Engn, Lund, Sweden
[4] Romanian Acad, Inst Math, Bucharest, Romania
关键词
Recognition; image descriptors; second-order statistics; segmentation; regression; pooling; differential geometry; RIEMANNIAN FRAMEWORK; RECOGNITION; CLASSIFICATION; FEATURES; SCALE; SEGMENTATION; CONTOURS; TEXTURE; KERNEL; SHAPES;
D O I
10.1109/TPAMI.2014.2361137
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation and object detection are nowadays dominated by methods operating on regions obtained as a result of a bottom-up grouping process (segmentation) but use feature extractors developed for recognition on fixed-form (e.g. rectangular) patches, with full images as a special case. This is most likely suboptimal. In this paper we focus on feature extraction and description over free-form regions and study the relationship with their fixed-form counterparts. Our main contributions are novel pooling techniques that capture the second-order statistics of local descriptors inside such free-form regions. We introduce second-order generalizations of average and max-pooling that together with appropriate non-linearities, derived from the mathematical structure of their embedding space, lead to state-of-the-art recognition performance in semantic segmentation experiments without any type of local feature coding. In contrast, we show that codebook-based local feature coding is more important when feature extraction is constrained to operate over regions that include both foreground and large portions of the background, as typical in image classification settings, whereas for high-accuracy localization setups, second-order pooling over free-form regions produces results superior to those of the winning systems in the contemporary semantic segmentation challenges, with models that are much faster in both training and testing.
引用
收藏
页码:1177 / 1189
页数:13
相关论文
共 68 条
[1]  
[Anonymous], 2009, Advances in neural information processing systems
[2]  
[Anonymous], IMPROVING BAG KEYPOI
[3]  
[Anonymous], 2008, VLFeat: An open and portable library of computer vision algorithms
[4]  
[Anonymous], 2008, CVPR
[5]  
Arbeláez P, 2012, PROC CVPR IEEE, P3378, DOI 10.1109/CVPR.2012.6248077
[6]  
Arbeláez P, 2009, PROC CVPR IEEE, P2294, DOI 10.1109/CVPRW.2009.5206707
[7]   Geometric means in a novel vector space structure on symmetric positive-definite matrices [J].
Arsigny, Vincent ;
Fillard, Pierre ;
Pennec, Xavier ;
Ayache, Nicholas .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2007, 29 (01) :328-347
[8]  
Belongie S., 2000, P ADV NEUR INF PROC, P509
[9]   In defense of Nearest-Neighbor based image classification [J].
Boiman, Oren ;
Shechtman, Eli ;
Irani, Michal .
2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, :1992-+
[10]   Harmony Potentials [J].
Boix, Xavier ;
Gonfaus, Josep M. ;
van de Weijer, Joost ;
Bagdanov, Andrew D. ;
Serrat, Joan ;
Gonzalez, Jordi .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 96 (01) :83-102