Text-Attentional Convolutional Neural Network for Scene Text Detection

被引:233
|
作者
He, Tong [1 ,2 ]
Huang, Weilin [1 ,3 ]
Qiao, Yu [1 ,3 ]
Yao, Jian [2 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430072, Peoples R China
[3] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Maximally stable extremal regions; text detector; convolutional neural networks; multi-level supervised information; multi-task learning; READING TEXT; LOCALIZATION;
D O I
10.1109/TIP.2016.2547588
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.
引用
收藏
页码:2529 / 2541
页数:13
相关论文
共 50 条
  • [41] Review on text detection methods on scene images
    Brisinello, Matteo
    Grbic, Ratko
    Vranjes, Mario
    Vranjes, Denis
    2019 61ST INTERNATIONAL SYMPOSIUM ELMAR, 2019, : 51 - 56
  • [42] Text Detection and Recognition in Natural Scene Images
    Huang, Xiaoming
    Shen, Tao
    Wang, Run
    Gao, Chenqiang
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 44 - 49
  • [43] LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION
    Li, Yao
    Shen, Chunhua
    Jia, Wenjing
    van den Hengel, Anton
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2264 - 2268
  • [44] Label distribution learning for scene text detection
    MA Haoyu
    LU Ningning
    MEI Junjun
    GUAN Tao
    ZHANG Yu
    GENG Xin
    Frontiers of Computer Science, 2023, 17 (06)
  • [45] A decade: Review of scene text detection methods
    Rainarli, Ednawati
    Suprapto
    Wahyono
    COMPUTER SCIENCE REVIEW, 2021, 42
  • [46] A robust arbitrary text detection system for natural scene images
    Risnumawan, Anhar
    Shivakumara, Palaiahankote
    Chan, Chee Seng
    Tan, Chew Lim
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (18) : 8027 - 8048
  • [47] Deep learning for detection of text polarity in natural scene images
    Perepu, Pavan Kumar
    NEUROCOMPUTING, 2021, 431 : 1 - 6
  • [48] A Novel Method for Curved Scene Text Detection
    He, Chao
    Lu, Junguo
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 2473 - 2478
  • [49] Scene text detection by adaptive feature selection with text scale-aware loss
    Wu, Qin
    Luo, Wenli
    Chai, Zhilei
    Guo, Guodong
    APPLIED INTELLIGENCE, 2022, 52 (01) : 514 - 529
  • [50] Scene video text tracking based on hybrid deep text detection and layout constraint
    Wang, Xihan
    Feng, Xiaoyi
    Xia, Zhaoqiang
    NEUROCOMPUTING, 2019, 363 : 223 - 235