PCBSNet: A Pure Convolutional Bilateral Segmentation Network for Real-Time Natural Scene Text Detection

被引：6

作者：

Lian, Zhe ^{[1
]}

Yin, Yanjun ^{[1
]}

Zhi, Min ^{[1
]}

Xu, Qiaozhi ^{[1
]}

机构：

[1] Inner Mongolia Normal Univ, Coll Comp Sci & Technol, Hohhot 010022, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 14期

关键词：

scene text detection; pure convolutional bilateral segmentation; efficient semantic extraction; efficient attention aggregation; feature enhancement; IMPROVED YOLOV5; RECOGNITION;

D O I：

10.3390/electronics12143055

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scene text detection is a fundamental research work in the field of image processing and has extensive application value. Segmentation-based methods have time-consuming feature processing, while post-processing algorithms are excellent. Real-time semantic segmentation methods use lightweight backbone networks for feature extraction and aggregation but lack effective post-processing methods. The pure convolutional network improves model performance by changing key components. Combining the advantages of three types of methods, we propose a Pure Convolutional Bilateral Segmentation Network (PCBSNet) for real-time natural scene text detection. First, we constructed a bilateral feature extraction backbone network to significantly improve detection speed. The low extraction detail branch captures spatial information, while the efficient semantic extraction branch accurately captures semantic features through a series of micro designs. Second, we built an efficient attention aggregation module to guide the efficient and adaptive aggregation of features from the two branches. The fused feature map undergoes feature enhancement to obtain more accurate and reliable feature representation. Finally, we used differentiable binarization post-processing to construct text instance boundaries. To evaluate the effectiveness of the proposed model, we compared it with mainstream lightweight models on three datasets: ICDAR2015, MSRA-TD500, and CTW1500. The F-measure scores were 82.9%, 82.8%, and 78.9%, respectively, and the FPS were 59.1, 94.3, and 75.5 frames per second. We also conducted extensive ablation experiments on the ICDAR2015 dataset to validate the rationality of the proposed improvements. The obtained results indicate that the proposed model significantly improves inference speed while enhancing accuracy and demonstrates good competitiveness compared to other advanced detection methods. However, when faced with curved text, the detection performance of PCBSNet needs to be improved.

引用

页数：23

共 52 条

[1] Character Region Awareness for Text Detection [J].

Baek, Youngmin ;

Lee, Bado ;

Han, Dongyoon ;

Yun, Sangdoo ;

Lee, Hwalsuk .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9357-9366

[2]

Chen JR, 2023, Arxiv, DOI [arXiv:2303.03667, 10.48550/arXiv.2303.03667]

[3] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[4] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Han, Jungong ;

Ding, Guiguang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965

[5]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[6]

Guo MH, 2022, Arxiv, DOI arXiv:2202.09741

[7] Synthetic Data for Text Localisation in Natural Images [J].

Gupta, Ankush ;

Vedaldi, Andrea ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9] An End-to-End TextSpotter with Explicit Alignment and Attention [J].

He, Tong ;

Tian, Zhi ;

Huang, Weilin ;

Shen, Chunhua ;

Qiao, Yu ;

Sun, Changming .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5020-5029

[10] Coordinate Attention for Efficient Mobile Network Design [J].

Hou, Qibin ;

Zhou, Daquan ;

Feng, Jiashi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717

← 1 2 3 4 5 6 →