Scene text detection using graph model built upon maximally stable extremal regions

被引:158
作者
Shi, Cunzhao [1 ]
Wang, Chunheng [1 ]
Xiao, Baihua [1 ]
Zhang, Yang [1 ]
Gao, Song [1 ]
机构
[1] Chinese Acad Sci, State Key Lab Management & Control Complex Syst, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; MSER; Graph model; Cost function; Graph cut; VIDEO; IMAGES;
D O I
10.1016/j.patrec.2012.09.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection could be formulated as a bi-label (text and non-text regions) segmentation problem. However, due to the high degree of intraclass variation of scene characters as well as the limited number of training samples, single information source or classifier is not enough to segment text from non-text background. Thus, in this paper, we propose a novel scene text detection approach using graph model built upon Maximally Stable Extremal Regions (MSERs) to incorporate various information sources into one framework. Concretely, after detecting MSERs in the original image, an irregular graph whose nodes are MSERs, is constructed to label MSERs as text regions or non-text ones. Carefully designed features contribute to the unary potential to assess the individual penalties for labeling a MSER node as text or non-text, and color and geometric features are used to define the pairwise potential to punish the likely discontinuities. By minimizing the cost function via graph cut algorithm, different information carried by the cost function could be optimally balanced to get the final MSERs labeling result. The proposed method is naturally context-relevant and scale-insensitive. Experimental results on the ICDAR 2011 competition dataset show that the proposed approach outperforms state-of-the-art methods both in recall and precision. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:107 / 116
页数:10
相关论文
共 24 条
[1]   An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision [J].
Boykov, Y ;
Kolmogorov, V .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) :1124-1137
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Text detection and recognition in images and video frames [J].
Chen, DT ;
Odobez, JM ;
Bourlard, H .
PATTERN RECOGNITION, 2004, 37 (03) :595-608
[5]  
Chen H., 2011, 2011 18th IEEE International Conference on Image Processing (ICIP 2011), P2609, DOI 10.1109/ICIP.2011.6116200
[6]  
Chen XR, 2004, PROC CVPR IEEE, P366
[7]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[8]  
de Campos Teofilo Emidio, 2009, VISAPP, V2
[9]  
Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041
[10]   A stroke filter and its application to text localization [J].
Jung, Cheolkon ;
Liu, Qifeng ;
Kim, Joongkyu .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :114-122