A Semi-Supervised Algorithm for Auto-Annotation and Unknown Structures Discovery in Satellite Image Databases

被引:20
作者
Blanchart, Pierre [1 ]
Datcu, Mihai [2 ]
机构
[1] ParisTech Telecom, TSI, Paris, France
[2] IMF, German Aerosp Ctr DLR, D-82234 Oberpfaffenhofen, Wessling, Germany
关键词
Bayesian inference; expectation-Maximization; Gaussian mixtures; hierarchical Bayesian models; latent variable models; semantic image annotation; semi-supervised learning; unknown image classes discovery;
D O I
10.1109/JSTARS.2010.2058794
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The increasing number and resolution of earth observation (EO) imaging sensors has had a significant impact on both the acquired image data volume and the information content in images. There is consequently a strong need for highly efficient search tools for EO image databases and for search methods to automatically identify and recognize structures within EO images. Content Based Image Retrieval (CBIR) and automatic image annotation systems have been designed to tackle the problem of image retrieval in large image databases. These two systems achieve a common goal, that is to learn the mapping function between low-level visual features and high-level image semantics. A setup, which has hardly been explored in annotating systems and which is the rule rather than the exception, is the case when the training database used to learn the mapping function is not exhaustive regarding semantic classes present in the images. This means that there exists unknown image classes for which there is no training examples in the training database. In this paper, we propose a semi-supervised method for auto-annotating satellite image databases and discovering unknown semantic image classes in these databases. The idea is to incorporate into the learning process the unannotated data which by definition contain the unknown image classes. The latter are considered to be latent structures in the data that appear when we train a hierarchical latent variable model with both the labeled and unlabeled data. We also show that, in our case, the use of unlabeled data leads to more reliable estimates regarding the model parameters. We present experimental results on a synthetic dataset, making a comparison of our algorithm with a semi-supervised Support Vector Machine (S3VM) on this dataset. We also demonstrate the effectiveness of our unknown image classes discovery procedure on a database of SPOT5 satellite images. We show that the results obtained on this database are rather positive since the new structures detected correspond to semantic classes which are not represented in the training database.
引用
收藏
页码:698 / 717
页数:20
相关论文
共 31 条
  • [1] [Anonymous], 1999, Latent Variable Models and Factor Analysis
  • [2] [Anonymous], 1999, Nonparametric Statistical Methods
  • [3] Matching words and pictures
    Barnard, K
    Duygulu, P
    Forsyth, D
    de Freitas, N
    Blei, DM
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1107 - 1135
  • [4] Blanchart P., 2009, IGARSS
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] CAMPEDEL M, 2004, INDEXATION SATELLITE
  • [7] Carneiro G., 2007, IEEE T PATTERN ANAL, V29
  • [8] Blobworld: Image segmentation using expectation-maximization and its application to image querying
    Carson, C
    Belongie, S
    Greenspan, H
    Malik, J
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (08) : 1026 - 1038
  • [9] COSTACHE M, 2006, IGARSS
  • [10] The Bayesian image retrieval system, PicHunter:: Theory, implementation, and psychophysical experiments
    Cox, IJ
    Miller, ML
    Minka, TP
    Papathomas, TV
    Yianilos, PN
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (01) : 20 - 37