WordSup: Exploiting Word Annotations for Character based Text Detection

被引：127

作者：

Hu, Han ^{[1
]}

Zhang, Chengquan ^{[2
]}

Luo, Yuxuan ^{[2
]}

Wang, Yuzhuo ^{[2
]}

Han, Junyu ^{[2
]}

Ding, Errui ^{[2
]}

机构：

[1] Microsoft Res Asia, Beijing, Peoples R China

[2] Baidu Res, IDL, Sunnyvale, CA USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年

关键词：

D O I：

10.1109/ICCV.2017.529

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 [19] and COCO-text [39]. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.

引用

页码：4950 / 4959

页数：10

共 51 条

[1]

Aho A. V., 1983, DATA STRUCTURES ALGO

[2]

[Anonymous], 2015, CORR

[3]

[Anonymous], ARXIV E PRINTS

[4]

[Anonymous], 2017, P IEEE C COMPUTER VI

[5]

[Anonymous], ABS160309423 CORR

[6] PRINCIPAL WARPS - THIN-PLATE SPLINES AND THE DECOMPOSITION OF DEFORMATIONS [J].

BOOKSTEIN, FL .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (06) :567-585

[7]

Chen H., 2011, 2011 18th IEEE International Conference on Image Processing (ICIP 2011), P2609, DOI 10.1109/ICIP.2011.6116200

[8] BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation [J].

Dai, Jifeng ;

He, Kaiming ;

Sun, Jian .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1635-1643

[9]

Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041

[10] Learning to forget: Continual prediction with LSTM [J].

Gers, FA ;

Schmidhuber, J ;

Cummins, F .

NEURAL COMPUTATION, 2000, 12 (10) :2451-2471

← 1 2 3 4 5 6 →