Could scene context be beneficial for scene text detection?

被引:31
作者
Zhu, Anna [1 ]
Gao, Renwu [2 ]
Uchida, Seiichi [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Automat, State Key Lab Multispectral Informat Proc Technol, Wuhan 430074, Peoples R China
[2] Kyushu Univ, Human Interface Lab, Informat Sci & Elect Engn, Fukuoka 812, Japan
关键词
Scene text detection; Fully connected CRF; Convolutional neural network; Character feature; Context feature; READING TEXT; SEGMENTATION; IMAGE; RECOGNITION;
D O I
10.1016/j.patcog.2016.04.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if its surrounding segment, i.e., its context, is labeled as sky? In this paper, we have a positive answer by constructing a scene context-based text detection model. In this model, we use texton features and a fully-connected conditional random field (CRF) to estimate pixel-level scene class's probability to be considered as image's context feature. Meanwhile, maximally stable extremal regions (MSERs) are extracted, integrated and extended as image patches of character candidates. Then, each image patch is fed to a simple two-layer convolutional neural network (CNN) to automatically extract its character feature. The averaged context feature of the corresponding patch is considered as the patch's context feature. The character feature and context feature are fused as the input into a support vector machine for text/non-text determination. Finally, as a post-processing, neighboring text regions are grouped hierarchically. The performance evaluation on ICDAR2013 and SVT databases, as well as a preliminary evaluation on a patch-level database, proves that the scene context can improve the performance of scene text detection. Moreover, the comparative study with state-of-the-art methods shows the top-level performance of our method. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:204 / 215
页数:12
相关论文
共 54 条
[1]  
[Anonymous], ARXIV150201852
[2]   PhotoOCR: Reading Text in Uncontrolled Conditions [J].
Bissacco, Alessandro ;
Cummins, Mark ;
Netzer, Yuval ;
Neven, Hartmut .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792
[3]  
Chellapilla K., 2006, P INT WORKSH FHR
[4]  
Chen H., 2011, 2011 18th IEEE International Conference on Image Processing (ICIP 2011), P2609, DOI 10.1109/ICIP.2011.6116200
[5]  
Chen XR, 2004, PROC CVPR IEEE, P366
[6]   Outdoor Scene Image Segmentation Based on Background Recognition and Perceptual Organization [J].
Cheng, Chang ;
Koschan, Andreas ;
Chen, Chung-Hao ;
Page, David L. ;
Abidi, Mongi A. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012, 21 (03) :1007-1019
[7]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[8]  
Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041
[9]   Efficient graph-based image segmentation [J].
Felzenszwalb, PF ;
Huttenlocher, DP .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 59 (02) :167-181
[10]  
Fulkerson B., 2009, IEEE I CONF COMP VIS, P670, DOI 10.1109/ICCV.2009.5459175