Scene text detection using graph model built upon maximally stable extremal regions

被引：158

作者：

Shi, Cunzhao ^{[1
]}

Wang, Chunheng ^{[1
]}

Xiao, Baihua ^{[1
]}

Zhang, Yang ^{[1
]}

Gao, Song ^{[1
]}

机构：

[1] Chinese Acad Sci, State Key Lab Management & Control Complex Syst, Inst Automat, Beijing 100190, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2013年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Scene text detection; MSER; Graph model; Cost function; Graph cut; VIDEO; IMAGES;

D O I：

10.1016/j.patrec.2012.09.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text detection could be formulated as a bi-label (text and non-text regions) segmentation problem. However, due to the high degree of intraclass variation of scene characters as well as the limited number of training samples, single information source or classifier is not enough to segment text from non-text background. Thus, in this paper, we propose a novel scene text detection approach using graph model built upon Maximally Stable Extremal Regions (MSERs) to incorporate various information sources into one framework. Concretely, after detecting MSERs in the original image, an irregular graph whose nodes are MSERs, is constructed to label MSERs as text regions or non-text ones. Carefully designed features contribute to the unary potential to assess the individual penalties for labeling a MSER node as text or non-text, and color and geometric features are used to define the pairwise potential to punish the likely discontinuities. By minimizing the cost function via graph cut algorithm, different information carried by the cost function could be optimally balanced to get the final MSERs labeling result. The proposed method is naturally context-relevant and scale-insensitive. Experimental results on the ICDAR 2011 competition dataset show that the proposed approach outperforms state-of-the-art methods both in recall and precision. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：107 / 116

页数：10

共 24 条

[1] An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision [J].

Boykov, Y ;

Kolmogorov, V .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) :1124-1137

[2] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[3] A COMPUTATIONAL APPROACH TO EDGE-DETECTION [J].

CANNY, J .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1986, 8 (06) :679-698

[4] Text detection and recognition in images and video frames [J].

Chen, DT ;

Odobez, JM ;

Bourlard, H .

PATTERN RECOGNITION, 2004, 37 (03) :595-608

[5]

Chen H., 2011, 2011 18th IEEE International Conference on Image Processing (ICIP 2011), P2609, DOI 10.1109/ICIP.2011.6116200

[6]

Chen XR, 2004, PROC CVPR IEEE, P366

[7] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[8]

de Campos Teofilo Emidio, 2009, VISAPP, V2

[9]

Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041

[10] A stroke filter and its application to text localization [J].

Jung, Cheolkon ;

Liu, Qifeng ;

Kim, Joongkyu .

PATTERN RECOGNITION LETTERS, 2009, 30 (02) :114-122

← 1 2 3 →