Boundary-Aware Arbitrary-Shaped Scene Text Detector With Learnable Embedding Network

被引:7
作者
Xing, Mengting [1 ]
Xie, Hongtao [1 ]
Tan, Qingfeng [2 ]
Fang, Shancheng [1 ]
Wang, Yuxin [1 ]
Zha, Zhengjun [1 ]
Zhang, Yongdong [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230052, Anhui, Peoples R China
[2] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 511442, Guangdong, Peoples R China
关键词
Feature extraction; Task analysis; Proposals; Noise measurement; Detectors; Shape; Noise reduction; Scene text detection; boundary representation; false positive suppression;
D O I
10.1109/TMM.2021.3093727
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Benefiting from the popularity of deep learning theory, scene text detection algorithms have developed rapidly in recent years. Methods representing text region by text segmentation map are proved to capture arbitrary-shaped text in a more flexible and accurate way. However, such segmentation-based methods are prone to be disturbed by the text-like background patterns (like the fence, grass, etc.), which generally suffer from imprecise boundary detail problem. In this paper, LEMNet is proposed to handle the imprecise boundary problem by guiding the generation of text boundary based on a priori constraint. In the training stage, Boundary Segmentation Branch is firstly constructed to predict coarse boundary mask for each text instance. Then, through mapping pixels into an embedding space, the proposed Pixel Embedding Branch makes the embedding representation of boundary points learn to be more similar, meanwhile enlarging the characteristic distance between background points and boundary points. During inference, noise in the coarse boundary segmentation map can be effectively suppressed by a Noisy Point Suppression Algorithm among pixel embedding vectors. In this way, LEMNet can generate a more precise boundary description of text regions. To further enhance the distinguishability of boundary features, we propose a Context Enhancement Module to capture feature interactions in different representation subspaces, in which features are parallelly performed attention and concatenated to generate enhanced features. Extensive experiments are conducted over four challenging datasets, which demonstrate the effectiveness of LEMNet. Specifically, LEMNet achieves F-measure of 85.2%, 87.6% and 85.2% on CTW1500, Total-Text and MSRA-TD500 respectively, which is the latest SOTA.
引用
收藏
页码:3129 / 3143
页数:15
相关论文
共 58 条
  • [1] [Anonymous], 2017, ARXIV171202170
  • [2] Character Region Awareness for Text Detection
    Baek, Youngmin
    Lee, Bado
    Han, Dongyoon
    Yun, Sangdoo
    Lee, Hwalsuk
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9357 - 9366
  • [3] Brabandere B. D., 2017, P IEEE C COMP VIS PA, P478
  • [4] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
    Ch'ng, Chee Kheng
    Chan, Chee Seng
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
  • [5] Frequency Domain Compact 3D Convolutional Neural Networks
    Chen, Hanting
    Wang, Yunhe
    Shu, Han
    Tang, Yehui
    Xu, Chunjing
    Shi, Boxin
    Xu, Chao
    Tian, Qi
    Xu, Chang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1638 - 1647
  • [6] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
  • [7] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] CenterNet: Keypoint Triplets for Object Detection
    Duan, Kaiwen
    Bai, Song
    Xie, Lingxi
    Qi, Honggang
    Huang, Qingming
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
  • [10] Ester M., 1996, P 2 INT C KNOWL DISC, DOI DOI 10.5555/3001460.3001507