TextField: Learning a Deep Direction Field for Irregular Scene Text Detection

被引:242
作者
Xu, Yongchao [1 ]
Wang, Yukang [1 ]
Zhou, Wei [1 ]
Wang, Yongpan [2 ]
Yang, Zhibo [2 ]
Bai, Xiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
[2] Alibaba Grp, Hangzhou 311121, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; multi-oriented text; curved text; deep neural networks; COMPONENT TREE; NEURAL-NETWORK; RECOGNITION; IMAGE;
D O I
10.1109/TIP.2019.2900589
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text detection is an important step in the scene text reading system. The main challenges lie in significantly varied sizes and aspect ratios, arbitrary orientations, and shapes. Driven by the recent progress in deep learning, impressive performances have been achieved for multi-oriented text detection. Yet, the performance drops dramatically in detecting the curved texts due to the limited text representation (e.g., horizontal bounding boxes, rotated rectangles, or quadrilaterals). It is of great interest to detect the curved texts, which are actually very common in natural scenes. In this paper, we present a novel text detector named TextField for detecting irregular scene texts. Specifically, we learn a direction field pointing away from the nearest text boundary to each text point. This direction field is represented by an image of 2D vectors and learned via a fully convolutional neural network. It encodes both binary text mask and direction information used to separate adjacent text instances, which is challenging for the classical segmentation-based approaches. Based on the learned direction field, we apply a simple yet effective morphological-based post-processing to achieve the final detection. The experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and SCUT-CTW1500, respectively; TextField also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500. Furthermore, TextField is robust in generalizing unseen datasets.
引用
收藏
页码:5566 / 5579
页数:14
相关论文
共 70 条
[1]   Scene Text Localization Using Gradient Local Correlation [J].
Bai, Bo ;
Yin, Fei ;
Liu, Cheng-Lin .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1380-1384
[2]   Deep Watershed Transform for Instance Segmentation [J].
Bai, Min ;
Urtasun, Raquel .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2858-2866
[3]   Directional Edge Boxes: Exploiting Inner Normal Direction Cues for Effective Object Proposal Generation [J].
Bai, Xiang ;
Zhang, Zheng ;
Wang, Hong-Yang ;
Shen, Wei .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (04) :701-713
[4]   A Comparative Review of Component Tree Computation Algorithms [J].
Carlinet, Edwin ;
Geraud, Thierry .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) :3885-3895
[5]   Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].
Ch'ng, Chee Kheng ;
Chan, Chee Seng .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942
[6]   MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features [J].
Chen, Liang-Chieh ;
Hermans, Alexander ;
Papandreou, George ;
Schroff, Florian ;
Wang, Peng ;
Adam, Hartwig .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4013-4022
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041
[9]   TextCatcher: a method to detect curved and challenging text in natural scenes [J].
Fabrizio, Jonathan ;
Robert-Seidowsky, Myriam ;
Dubuisson, Severine ;
Calarasanu, Stefania ;
Boissel, Raphael .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (02) :99-117
[10]   Multi-script Text Extraction from Natural Scenes [J].
Gomez, Lluis ;
Karatzas, Dimosthenis .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :467-471