Script identification in the wild via discriminative convolutional neural network

被引:95
作者
Shi, Baoguang [1 ]
Bai, Xiang [1 ]
Yao, Cong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Script identification; Convolutional neural network; Mid-level representation; Discriminative clustering; Dataset; INVARIANT TEXTURE FEATURES; IMAGES; EXTRACTION; RECOGNITION; SYSTEM; VIDEO; FRAMEWORK; ROTATION;
D O I
10.1016/j.patcog.2015.11.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Script identification facilitates many important applications in document/video analysis. This paper investigates a relatively new problem: identifying scripts in natural images. The basic idea is combining deep features and mid-level representations into a globally trainable deep model. Specifically, a set of deep feature maps is firstly extracted by a pre-trained CNN model from the input images, where the local deep features are densely collected. Then, discriminative clustering is performed to learn a set of discriminative patterns based on such local features. A mid-level representation is obtained by encoding the local features based on the learned discriminative patterns (codebook). Finally, the mid-level representations and the deep features are jointly optimized in a deep network. Benefiting from such a fine-grained classification strategy, the optimized deep model, termed Discriminative Convolutional Neural Network (DisCNN), is capable of effectively revealing the subtle differences among the scripts difficult to be distinguished, e.g. Chinese and Japanese. In addition, a large scale dataset containing 16,291 in-the wild text images in 13 scripts, namely SIW-13, is created for evaluation. Our method is not limited to identifying text images, and performs effectively on video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features. The experimental comparisons on the datasets including SIW-13, CVSI-2015 and Multi-Script consistently demonstrate DisCNN a state-of-the-art approach for script identification. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:448 / 458
页数:11
相关论文
共 57 条
[1]   Improving on-line handwritten recognition in interactive machine translation [J].
Alabau, Vicent ;
Sanchis, Alberto ;
Casacuberta, Francisco .
PATTERN RECOGNITION, 2014, 47 (03) :1217-1228
[2]  
[Anonymous], P CVPR
[3]  
[Anonymous], P ICDAR
[4]  
[Anonymous], P CVPR
[5]  
[Anonymous], 2010, P ECCV
[6]  
[Anonymous], P CVPR
[7]  
[Anonymous], 2012, P BMVC
[8]  
[Anonymous], 2008, P ECCV
[9]  
[Anonymous], 2006, IEEE COMP SOC C COMP
[10]  
[Anonymous], IEEE T PATTERN ANAL