Boundary-Aware Arbitrary-Shaped Scene Text Detector With Learnable Embedding Network

被引：7

作者：

Xing, Mengting ^{[1
]}

Xie, Hongtao ^{[1
]}

Tan, Qingfeng ^{[2
]}

Fang, Shancheng ^{[1
]}

Wang, Yuxin ^{[1
]}

Zha, Zhengjun ^{[1
]}

Zhang, Yongdong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230052, Anhui, Peoples R China

[2] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 511442, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

关键词：

Feature extraction; Task analysis; Proposals; Noise measurement; Detectors; Shape; Noise reduction; Scene text detection; boundary representation; false positive suppression;

D O I：

10.1109/TMM.2021.3093727

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Benefiting from the popularity of deep learning theory, scene text detection algorithms have developed rapidly in recent years. Methods representing text region by text segmentation map are proved to capture arbitrary-shaped text in a more flexible and accurate way. However, such segmentation-based methods are prone to be disturbed by the text-like background patterns (like the fence, grass, etc.), which generally suffer from imprecise boundary detail problem. In this paper, LEMNet is proposed to handle the imprecise boundary problem by guiding the generation of text boundary based on a priori constraint. In the training stage, Boundary Segmentation Branch is firstly constructed to predict coarse boundary mask for each text instance. Then, through mapping pixels into an embedding space, the proposed Pixel Embedding Branch makes the embedding representation of boundary points learn to be more similar, meanwhile enlarging the characteristic distance between background points and boundary points. During inference, noise in the coarse boundary segmentation map can be effectively suppressed by a Noisy Point Suppression Algorithm among pixel embedding vectors. In this way, LEMNet can generate a more precise boundary description of text regions. To further enhance the distinguishability of boundary features, we propose a Context Enhancement Module to capture feature interactions in different representation subspaces, in which features are parallelly performed attention and concatenated to generate enhanced features. Extensive experiments are conducted over four challenging datasets, which demonstrate the effectiveness of LEMNet. Specifically, LEMNet achieves F-measure of 85.2%, 87.6% and 85.2% on CTW1500, Total-Text and MSRA-TD500 respectively, which is the latest SOTA.

引用

页码：3129 / 3143

页数：15

共 58 条

[1] [Anonymous], 2017, ARXIV171202170
[2] Character Region Awareness for Text Detection
Baek, Youngmin
Lee, Bado
Han, Dongyoon
Yun, Sangdoo
Lee, Hwalsuk
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9357 - 9366
[3] Brabandere B. D., 2017, P IEEE C COMP VIS PA, P478
[4] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Ch'ng, Chee Kheng
Chan, Chee Seng
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
[5] Frequency Domain Compact 3D Convolutional Neural Networks
Chen, Hanting
Wang, Yunhe
Shu, Han
Tang, Yehui
Xu, Chunjing
Shi, Boxin
Xu, Chao
Tian, Qi
Xu, Chang
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1638 - 1647
[6] Deformable Convolutional Networks
Dai, Jifeng
Qi, Haozhi
Xiong, Yuwen
Li, Yi
Zhang, Guodong
Hu, Han
Wei, Yichen
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773
[7] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
[8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9] CenterNet: Keypoint Triplets for Object Detection
Duan, Kaiwen
Bai, Song
Xie, Lingxi
Qi, Honggang
Huang, Qingming
Tian, Qi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
[10] Ester M., 1996, P 2 INT C KNOWL DISC, DOI DOI 10.5555/3001460.3001507

← 1 2 3 4 5 6 →