A thousand words in a scene

被引：128

作者：

Quelhas, Pedro

Monay, Florent

Odobez, Jean-Marc

Gatica-Perez, Daniel

Tuytelaars, Tinne

机构：

[1] IDIAP, Res Inst, CH-1920 Martigny, Switzerland

[2] ESAT PSI, B-3001 Louvain, Belgium

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2007年 / 29卷 / 09期

关键词：

image representation; scene classification; object recognition; quantized local descriptors; latent aspect modeling;

D O I：

10.1109/TPAMI.2007.1155

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate 1) whether a textlike bag-of-visterms (BOV) representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, 2) whether some analogies between discrete scene representations and text documents exist, and 3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multiclass scene classification tasks using a 9,500-image data set, that the BOV representation consistently outperforms classical scene classification approaches. In other data sets, we show that our approach competes with or outperforms other recent more complex methods. We also show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, is discriminative for accurate classification, and is more robust than the BOV representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.

引用

页码：1575 / 1589

页数：15

共 43 条

[1]

[Anonymous], 1998, CSDTR9804 U LOND DEP

[2]

[Anonymous], P IEEE INT C COMP VI

[3]

[Anonymous], 2003, P IEEE INT C COMP VI

[4]

[Anonymous], 2005, THESIS J GUTENBERG U

[5]

Baeza-Yates R.A., 1999, Modern Information Retrieval

[6] Matching words and pictures [J].

Barnard, K ;

Duygulu, P ;

Forsyth, D ;

de Freitas, N ;

Blei, DM ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1107-1135

[7]

BLEI D, 2003, P 26 INT C RES DEV I

[8] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

[9] Learning multi-label scene classification [J].

Boutell, MR ;

Luo, JB ;

Shen, XP ;

Brown, CM .

PATTERN RECOGNITION, 2004, 37 (09) :1757-1771

[10] A tutorial on Support Vector Machines for pattern recognition [J].

Burges, CJC .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167

← 1 2 3 4 5 →