TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

被引:586
作者
Shotton, Jamie [1 ]
Winn, John [2 ]
Rother, Carsten [2 ]
Criminisi, Antonio [2 ]
机构
[1] Univ Cambridge, Machine Intelligence Lab, Cambridge CB2 1PZ, England
[2] Microsoft Res Cambridge, Cambridge CB3 OFB, England
关键词
Image understanding; Object recognition; Segmentation; Texture; Layout; Context; Textons; Conditional random field; Boosting; Semantic image segmentation; Piecewise training; SHAPE; FEATURES;
D O I
10.1007/s11263-007-0109-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper details a new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently. The learned model is used for automatic visual understanding and semantic segmentation of photographs. Our discriminative model exploits texture-layout filters, novel features based on textons, which jointly model patterns of texture and their spatial layout. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating the unary classifier in a conditional random field, which (i) captures the spatial interactions between class labels of neighboring pixels, and (ii) improves the segmentation of specific object instances. Efficient training of the model on large datasets is achieved by exploiting both random feature selection and piecewise training methods. High classification and segmentation accuracy is demonstrated on four varied databases: (i) the MSRC 21-class database containing photographs of real objects viewed under general lighting conditions, poses and viewpoints, (ii) the 7-class Corel subset and (iii) the 7-class Sowerby database used in He et al. (Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 695-702, June 2004), and (iv) a set of video sequences of television shows. The proposed algorithm gives competitive and visually pleasing results for objects that are highly textured (grass, trees, etc.), highly structured (cars, faces, bicycles, airplanes, etc.), and even articulated (body, cow, etc.).
引用
收藏
页码:2 / 23
页数:22
相关论文
共 53 条
[41]   GrabCut - Interactive foreground extraction using iterated graph cuts [J].
Rother, C ;
Kolmogorov, V ;
Blake, A .
ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03) :309-314
[42]   AutoCollage [J].
Rother, Carsten ;
Bordeaux, Lucas ;
Hamadi, Youssef ;
Blake, Andrew .
ACM TRANSACTIONS ON GRAPHICS, 2006, 25 (03) :847-852
[43]  
RUSSEL BC, 2005, 25 MIT AI LAB
[44]  
Shotton J, 2006, LECT NOTES COMPUT SC, V3951, P1
[45]  
SUTTON C, 2005, P C UNC ART INT
[46]   Sharing visual features for multiclass and multiview object detection [J].
Torralba, Antonio ;
Murphy, Kevin P. ;
Freeman, William T. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (05) :854-869
[47]   Image parsing: Unifying segmentation, detection, and recognition [J].
Tu, ZW ;
Chen, XR ;
Yuille, AL ;
Zhu, SC .
NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, :18-25
[48]   A statistical approach to texture classification from single images [J].
Varma, M ;
Zisserman, A .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 62 (1-2) :61-81
[49]   Rapid object detection using a boosted cascade of simple features [J].
Viola, P ;
Jones, M .
2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, :511-518
[50]  
Winn J, 2005, IEEE I CONF COMP VIS, P1800