Rethinking text rectification for scene text recognition

被引:6
作者
Ke, Wenjun [1 ]
Wei, Jianguo [1 ,2 ]
Hou, Qingzhi [3 ]
Feng, Hui [4 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Qinghai Minzu Univ, Dept Comp, Xining, Peoples R China
[3] Tianjin Univ, State Key Lab Hydraul Engn Simulat & Safety, Tianjin, Peoples R China
[4] Tianjin Univ, Sch Foreign Languages, Tianjin, Peoples R China
关键词
Text recognition; Text rectification; Rectification network; Spatial transformer network;
D O I
10.1016/j.eswa.2023.119647
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing scene text recognition methods have incorporated text rectification to lessen text irregularity in images for accurate recognition. Previous text rectification methods aim to convert an irregular text image into a regular form, making it easier to be recognized. In this study, we explore text rectification for text recognition and discover the issues: performance degradation of the recognition network and the unreliable situation of text rectification, which are ignored by all previous works. Therefore, we rethink what is causing two issues, and propose a rectification-based text recognition network to mitigate the above issues. The proposed network consists of text rectification and text recognition, and designs a multi-level feature aggregation module to enhance feature learning for character representation. Concretely, we devise a mixed batch training strategy to address the performance degradation of the recognition network, and design a confidence decoding scheme to avoid the unreliable situation of text rectification. Extensive ablation studies verified the positive role of the feature aggregation module in feature learning and the effectiveness of the proposed training strategy and decoding scheme in addressing the issues. Experimental results outperform the state-of-the-art results on public benchmarks.
引用
收藏
页数:9
相关论文
共 42 条
[1]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[2]   Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition [J].
Bhunia, Ayan Kumar ;
Sain, Aneeshan ;
Kumar, Amandeep ;
Ghose, Shuvozit ;
Chowdhury, Pinaki Nath ;
Song, Yi-Zhe .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14920-14929
[4]  
CAI H, 2021, arXiv
[5]   Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [J].
Fang, Shancheng ;
Xie, Hongtao ;
Wang, Yuxin ;
Mao, Zhendong ;
Zhang, Yongdong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7094-7103
[6]  
Fenfen Sheng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P781, DOI 10.1109/ICDAR.2019.00130
[7]   NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [J].
Ghiasi, Golnaz ;
Lin, Tsung-Yi ;
Le, Quoc V. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7029-7038
[8]   Unattached irregular scene text rectification with refined objective [J].
Gong, Yanxiang ;
Deng, Linjie ;
Zhang, Zhiqiang ;
Duan, Guozhen ;
Ma, Zheng ;
Xie, Mei .
NEUROCOMPUTING, 2021, 463 :101-108
[9]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[10]  
Jaderberg M, 2014, ARXIV