Background-Insensitive Scene Text Recognition with Text Semantic Segmentation

被引:8
作者
Zhao, Liang [1 ]
Wu, Zhenyao [1 ]
Wu, Xinyi [1 ]
Wilsbacher, Greg [1 ]
Wang, Song [1 ]
机构
[1] Univ South Carolina, Columbia, SC 29201 USA
来源
COMPUTER VISION, ECCV 2022, PT XXV | 2022年 / 13685卷
基金
美国国家科学基金会;
关键词
Scene text recognition; Semantic segmentation; EFFICIENT;
D O I
10.1007/978-3-031-19806-9_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene Text Recognition (STR) has many important applications in computer vision. Complex backgrounds continue to be a big challenge for STR because they interfere with text feature extraction. Many existing methods use attentional regions, bounding boxes or polygons to reduce such interference. However, the text regions located by these methods still contain much undesirable background interference. In this paper, we propose a Background-Insensitive approach BINet by explicitly leveraging the text Semantic Segmentation (SSN) to extract texts more accurately. SSN is trained on a set of existing segmentation data, whose volume is only 0.03% of STR training data. This prevents the large-scale pixel-level annotations of the STR training data. To effectively utilize the segmentation cues, we design new segmentation refinement and embedding blocks for refining text-masks and reinforcing visual features. Additionally, we propose an efficient pipeline that utilizes Synthetic Initialization (SI) for STR models trained only on real data (1.7% of STR training data), instead of on both synthetic and real data from scratch. Experiments show that the proposed method can recognize text from complex backgrounds more effectively, achieving state-of-the-art performance on several public datasets.
引用
收藏
页码:163 / 182
页数:20
相关论文
共 93 条
[1]   Mining criminal networks from unstructured text documents [J].
Al-Zaidy, Rabeah ;
Fung, Benjamin C. M. ;
Youssef, Amr M. ;
Fortin, Francis .
DIGITAL INVESTIGATION, 2012, 8 (3-4) :147-160
[2]  
Alsharif O., 2013, arXiv
[3]   Vision Transformer for Fast and Efficient Scene Text Recognition [J].
Atienza, Rowel .
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 :319-334
[4]   What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels [J].
Baek, Jeonghun ;
Matsui, Yusuke ;
Aizawa, Kiyoharu .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3112-3121
[5]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[6]   Depth-Aware Video Frame Interpolation [J].
Bao, Wenbo ;
Lai, Wei-Sheng ;
Ma, Chao ;
Zhang, Xiaoyun ;
Gao, Zhiyong ;
Yang, Ming-Hsuan .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3698-3707
[7]  
Bartz C, 2019, Arxiv, DOI arXiv:1911.08400
[8]   Seeing What a GAN Cannot Generate [J].
Bau, David ;
Zhu, Jun-Yan ;
Wulff, Jonas ;
Peebles, William ;
Strobelt, Hendrik ;
Zhou, Bolei ;
Torralba, Antonio .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4501-4510
[9]   Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition [J].
Bhunia, Ayan Kumar ;
Sain, Aneeshan ;
Kumar, Amandeep ;
Ghose, Shuvozit ;
Chowdhury, Pinaki Nath ;
Song, Yi-Zhe .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14920-14929
[10]   PhotoOCR: Reading Text in Uncontrolled Conditions [J].
Bissacco, Alessandro ;
Cummins, Mark ;
Netzer, Yuval ;
Neven, Hartmut .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792