A Hierarchical and Contextual Model for Aerial Image Parsing

被引:0
作者
Jake Porway
Qiongchen Wang
Song Chun Zhu
机构
[1] University of California,Department of Statistics
[2] Lotus Hill Institute for Computer Vision and Information Science,undefined
来源
International Journal of Computer Vision | 2010年 / 88卷
关键词
Hierarchical models; Scene-level context; Statistical learning; Image understanding; Aerial images; Swendsen-Wang clustering; Bayesian inference;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we present a hierarchical and contextual model for aerial image understanding. Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into hierarchical groups whose appearances and configurations are determined by statistical constraints (e.g. relative position, relative scale, etc.). Our hierarchy is a non-recursive grammar for objects in aerial images comprised of layers of nodes that can each decompose into a number of different configurations. This allows us to generate and recognize a vast number of scenes with relatively few rules. We present a minimax entropy framework for learning the statistical constraints between objects and show that this learned context allows us to rule out unlikely scene configurations and hallucinate undetected objects during inference. A similar algorithm was proposed for texture synthesis (Zhu et al. in Int. J. Comput. Vis. 2:107–126, 1998) but didn’t incorporate hierarchical information. We use a range of different bottom-up detectors (AdaBoost, TextonBoost, Compositional Boosting (Freund and Schapire in J. Comput. Syst. Sci. 55, 1997; Shotton et al. in Proceedings of the European Conference on Computer Vision, pp. 1–15, 2006; Wu et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007)) to propose locations of objects in new aerial images and employ a cluster sampling algorithm (C4 (Porway and Zhu, 2009)) to choose the subset of detections that best explains the image according to our learned prior model. The C4 algorithm can quickly and efficiently switch between alternate competing sub-solutions, for example whether an image patch is better explained by a parking lot with cars or by a building with vents. We also show that our model can predict the locations of objects our detectors missed. We conclude by presenting parsed aerial images and experimental results showing that our cluster sampling and top-down prediction algorithms use the learned contextual cues from our model to improve detection results over traditional bottom-up detectors alone.
引用
收藏
页码:254 / 283
页数:29
相关论文
共 24 条
[1]  
Barbu A.(2005)Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities Pattern Analysis and Machine Intelligence 27 1239-1253
[2]  
Zhu S.-C.(2005)Pictorial structures for object recognition International Journal of Computer Vision 61 55-79
[3]  
Felzenszwalb P.(1973)The representation and matching of pictorial structures IEEE Transactions on Computers 22 67-92
[4]  
Huttenlocher D.(2001)Generic model abstraction from examples Pattern Analysis and Machine Intelligence 27 1141-1156
[5]  
Fischler M.(1999)Shock graphs and shape matching International Journal of Computer Vision 35 13-32
[6]  
Elschlager R.(2002)Image segmentation by data-driven Markov chain Monte Carlo IEEE Transactions on Pattern Analysis and Machine Learning 24 657-673
[7]  
Keselman Y.(2008)Graphical models, exponential families, and variational inference Foundations and Trends in Machine Learning 1 1-305
[8]  
Dickinson S.(2007)Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks Energy Minimization Methods in Computer Vision and Pattern Recognition 4697 169-183
[9]  
Siddiqi K.(2006)A stochastic grammar of images Foundation and Trends in Computer Graphics and Vision 2 259-362
[10]  
Shokoufandeh A.(1998)Frame: Filters, random fields, and minimax entropy towards a unified theory for texture modeling International Journal of Computer Vision 2 107-126