TextMountain: Accurate scene text detection via instance segmentation

被引:66
作者
Zhu, Yixing [1 ]
Du, Jun [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Scene text detection; Curved text; Multi-oriented text; CNN; Deep learning; RECOGNITION;
D O I
10.1016/j.patcog.2020.107336
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel scene text detection method named TextMountain. The key idea of TextMountain is making full use of border-center information. Different from previous works that treat center-border as a binary classification problem, we predict text center-border probability (TCBP) and text center-direction (TCD). The TCBP is just like a mountain whose top is text center and foot is text border. The mountaintop can separate text instances which cannot be easily achieved using semantic segmentation map and its rising direction can plan a road to top for each pixel on mountain foot at the group stage. The TCD helps TCBP learning better. Our label rules will not lead to the ambiguous problem with the transformation of angle, so the proposed method is robust to multi-oriented text and can also handle well curved text. In inference stage, each pixel at the mountain foot needs to search the path to the mountaintop and this process can be efficiently completed in parallel, yielding the efficiency of our method compared with others. The experiments on MLT, ICDAR2015, RCTW-17 and SCUT-CTW150 0 datasets demonstrate that the proposed method achieves better or comparable performance in terms of both accuracy and efficiency. It is worth mentioning our method achieves an F-measure of 76.85% on MLT which outperforms the previous methods by a large margin. Code will be made available. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 58 条
  • [1] Deep Watershed Transform for Instance Segmentation
    Bai, Min
    Urtasun, Raquel
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2858 - 2866
  • [2] Text/non-text image classification in the wild with convolutional neural networks
    Bai, Xiang
    Shi, Baoguang
    Zhang, Chengquan
    Cai, Xuan
    Qi, Li
    [J]. PATTERN RECOGNITION, 2017, 66 : 437 - 446
  • [3] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [4] Dai YC, 2018, INT C PATT RECOG, P3604, DOI 10.1109/ICPR.2018.8546066
  • [5] De Brabandere B., 2017, Semantic instance segmentation with a discriminative loss function, DOI DOI 10.48550/ARXIV.1708.02551
  • [6] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
  • [7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [8] Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041
  • [9] The Pascal Visual Object Classes (VOC) Challenge
    Everingham, Mark
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
  • [10] Girshick R., 2017, P IEEE C COMP VIS PA, DOI [DOI 10.1109/CVPR.2017.106, 10.1109/CVPR.2017.106]