Real-Time Text Detection with Multi-level Feature Fusion and Pixel Clustering

被引:0
作者
Xu, Lu [1 ]
Jiang, Zhufeng [1 ]
Han, Xingyu [1 ]
Wang, Hui [1 ]
Fan, Zizhu [2 ]
机构
[1] East China Jiaotong Univ, Nanchang 330013, Jiangxi, Peoples R China
[2] Shanghai Univ Elect Power, Shanghai 201306, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII | 2025年 / 15037卷
关键词
Real-time text detection; Feature enhancement; Pixel clustering;
D O I
10.1007/978-981-97-8511-7_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent segmentation techniques for scene text detection have attracted significant attention for their ability to flexibly handle texts of various shapes and orientations. These techniques benefit from the high extensibility of pixel-level representations, but they face challenges due to complex network designs and slow post-processing steps, which hinder inference speeds and lead to suboptimal detection of unusual text shapes. We present a real-time detector for text of arbitrary shapes, named the Multi-Level Feature Fusion and Pixel Clustering (MFFPC) Network, to address these challenges. This method utilizes a lightweight feature extraction network, enhanced with specially designed Feature Enhancement Module (FEM) and Feature Filter Module (FFM) to improve feature representation. MFFPC enhances visual context understanding and refines lower-level feature maps using high-level features, effectively modeling text through a lightweight segmentation head and GPU-accelerated parallel post-processing. Additionally, an auxiliary training branch, inspired by clustering algorithms, further increases segmentation accuracy. The performance of MFFPC on three benchmark datasets validates its effectiveness. Specifically, on the challenging Total-Text dataset, it achieves an F-measure of 88.7% and processes at a speed of 66.8 frames per second (FPS).
引用
收藏
页码:16 / 29
页数:14
相关论文
共 30 条
  • [1] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
    Ch'ng, Chee Kheng
    Chan, Chee Seng
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
  • [2] Chen Z., 2021, arXiv
  • [3] Industrial Scene Text Detection With Refined Feature-Attentive Network
    Guan, Tongkun
    Gu, Chaochen
    Lu, Changsheng
    Tu, Jingzheng
    Feng, Qi
    Wu, Kaijie
    Guan, Xinping
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6073 - 6085
  • [4] Synthetic Data for Text Localisation in Natural Images
    Gupta, Ankush
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324
  • [5] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [6] Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion
    Liao, Minghui
    Zou, Zhisheng
    Wan, Zhaoyi
    Yao, Cong
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 919 - 931
  • [7] Liao MH, 2020, AAAI CONF ARTIF INTE, V34, P11474
  • [8] TextBoxes plus plus : A Single-Shot Oriented Scene Text Detector
    Liao, Minghui
    Shi, Baoguang
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 3676 - 3690
  • [9] Liao MH, 2017, AAAI CONF ARTIF INTE, P4161
  • [10] Feature Pyramid Networks for Object Detection
    Lin, Tsung-Yi
    Dollar, Piotr
    Girshick, Ross
    He, Kaiming
    Hariharan, Bharath
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944