Words Matter: Scene Text for Image Classification and Retrieval

被引：87

作者：

Karaoglu, Sezer ^{[1
]}

Tao, Ran ^{[2
]}

Gevers, Theo ^{[1
,3
]}

Smeulders, Arnold W. M. ^{[2
]}

机构：

[1] Univ Amsterdam, Comp Vis Lab, NL-1098 XH Amsterdam, Netherlands

[2] Univ Amsterdam, Intelligent Sensory Informat Syst Lab, NL-1098 XH Amsterdam, Netherlands

[3] Univ Autonoma Barcelona, Comp Vis Ctr, E-08193 Barcelona, Spain

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2017年 / 19卷 / 05期

关键词：

Professional communication; image retrieval; computers and information processing; image analysis; image classification; text recognition; object detection;

D O I：

10.1109/TMM.2016.2638622

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text in natural images typically adds meaning to an object or scene. In particular, text specifies which business places serve drinks (e.g., cafe, teahouse) or food (e.g., restaurant, pizzeria), and what kind of service is provided (e.g., massage, repair). The mere presence of text, its words, and meaning are closely related to the semantics of the object or scene. This paper exploits textual contents in images for fine-grained business place classification and logo retrieval. There are four main contributions. First, we show that the textual cues extracted by the proposed method are effective for the two tasks. Combining the proposed textual and visual cues outperforms visual only classification and retrieval by a large margin. Second, to extract the textual cues, a generic and fully unsupervised word box proposal method is introduced. The method reaches state-of-the-art word detection recall with a limited number of proposals. Third, contrary to what is widely acknowledged in text detection literature, we demonstrate that high recall in word detection is more important than high f-score at least for both tasks considered in this work. Last, this paper provides a large annotated text detection dataset with 10 K images and 27 601 word boxes.

引用

页码：1063 / 1076

页数：14

共 67 条

[1] Word Spotting and Recognition with Embedded Attributes [J].

Almazan, Jon ;

Gordo, Albert ;

Fornes, Alicia ;

Valveny, Ernest .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) :2552-2566

[2]

[Anonymous], 2016, ICLR

[3]

[Anonymous], 2013, CORR

[4]

[Anonymous], 2015, CORR

[5]

[Anonymous], 2013, Proceedings of the 21st ACM International Conference on Multimedia

[6]

[Anonymous], CORR

[7]

[Anonymous], 2014, CORR

[8]

[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199

[9]

[Anonymous], J VISION

[10]

[Anonymous], 2011, P 1 ACM INT C MULT R, DOI DOI 10.1145/1991996.1992021

← 1 2 3 4 5 6 7 →