Real-Time Text Detection with Multi-level Feature Fusion and Pixel Clustering

被引：0

作者：

Xu, Lu ^{[1
]}

Jiang, Zhufeng ^{[1
]}

Han, Xingyu ^{[1
]}

Wang, Hui ^{[1
]}

Fan, Zizhu ^{[2
]}

机构：

[1] East China Jiaotong Univ, Nanchang 330013, Jiangxi, Peoples R China

[2] Shanghai Univ Elect Power, Shanghai 201306, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII | 2025年 / 15037卷

关键词：

Real-time text detection; Feature enhancement; Pixel clustering;

D O I：

10.1007/978-981-97-8511-7_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent segmentation techniques for scene text detection have attracted significant attention for their ability to flexibly handle texts of various shapes and orientations. These techniques benefit from the high extensibility of pixel-level representations, but they face challenges due to complex network designs and slow post-processing steps, which hinder inference speeds and lead to suboptimal detection of unusual text shapes. We present a real-time detector for text of arbitrary shapes, named the Multi-Level Feature Fusion and Pixel Clustering (MFFPC) Network, to address these challenges. This method utilizes a lightweight feature extraction network, enhanced with specially designed Feature Enhancement Module (FEM) and Feature Filter Module (FFM) to improve feature representation. MFFPC enhances visual context understanding and refines lower-level feature maps using high-level features, effectively modeling text through a lightweight segmentation head and GPU-accelerated parallel post-processing. Additionally, an auxiliary training branch, inspired by clustering algorithms, further increases segmentation accuracy. The performance of MFFPC on three benchmark datasets validates its effectiveness. Specifically, on the challenging Total-Text dataset, it achieves an F-measure of 88.7% and processes at a speed of 66.8 frames per second (FPS).

引用

页码：16 / 29

页数：14

共 30 条

[1] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Ch'ng, Chee Kheng
Chan, Chee Seng
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
[2] Chen Z., 2021, arXiv
[3] Industrial Scene Text Detection With Refined Feature-Attentive Network
Guan, Tongkun
Gu, Chaochen
Lu, Changsheng
Tu, Jingzheng
Feng, Qi
Wu, Kaijie
Guan, Xinping
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6073 - 6085
[4] Synthetic Data for Text Localisation in Natural Images
Gupta, Ankush
Vedaldi, Andrea
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324
[5] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[6] Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion
Liao, Minghui
Zou, Zhisheng
Wan, Zhaoyi
Yao, Cong
Bai, Xiang
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 919 - 931
[7] Liao MH, 2020, AAAI CONF ARTIF INTE, V34, P11474
[8] TextBoxes plus plus : A Single-Shot Oriented Scene Text Detector
Liao, Minghui
Shi, Baoguang
Bai, Xiang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3676 - 3690
[9] Liao MH, 2017, AAAI CONF ARTIF INTE, P4161
[10] Feature Pyramid Networks for Object Detection
Lin, Tsung-Yi
Dollar, Piotr
Girshick, Ross
He, Kaiming
Hariharan, Bharath
Belongie, Serge
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944

← 1 2 3 →