Class-Aware Mask-guided feature refinement for scene text recognition

被引:12
作者
Yang, Mingkun [1 ]
Yang, Biao [2 ]
Liao, Minghui [5 ]
Zhu, Yingying [1 ,3 ]
Bai, Xiang [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Artifcial Intelligence & Automat, Wuhan, Peoples R China
[3] Huazhong Univ Sci & Technol, Hubei Key Lab Smart Internet Technol, Wuhan, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
[5] Huawei Inc, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Text recognition; Text segmentation; Multi-task learning; Feature fusion; ATTENTION NETWORK;
D O I
10.1016/j.patcog.2023.110244
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition is a rapidly developing field that faces numerous challenges due to the complexity and diversity of scene text, including complex backgrounds, diverse fonts, flexible arrangements, and accidental occlusions. In this paper, we propose a novel approach called Class -Aware Mask -guided feature refinement (CAM) to address these challenges. Our approach introduces canonical class -aware glyph masks generated from a standard font to effectively suppress background and text style noise, thereby enhancing feature discrimination. Additionally, we design a feature alignment and fusion module to incorporate the canonical mask guidance for further feature refinement for text recognition. By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance. We first evaluate CAM on six standard text recognition benchmarks to demonstrate its effectiveness. Furthermore, CAM exhibits superiority over the state-of-the-art method by an average performance gain of 4.1% across six more challenging datasets, despite utilizing a smaller model size. Our study highlights the importance of incorporating canonical mask guidance and aligned feature refinement techniques for robust scene text recognition. Code will be available at https://github.com/MelosY/CAM.
引用
收藏
页数:14
相关论文
共 67 条
[1]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[2]   Scene Text Recognition with Permuted Autoregressive Sequence Models [J].
Bautista, Darwin ;
Atienza, Rowel .
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 :178-196
[3]   Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation [J].
Bhunia, Ayan Kumar ;
Sain, Aneeshan ;
Chowdhury, Pinaki Nath ;
Song, Yi-Zhe .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :963-972
[4]   Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition [J].
Bhunia, Ayan Kumar ;
Sain, Aneeshan ;
Kumar, Amandeep ;
Ghose, Shuvozit ;
Chowdhury, Pinaki Nath ;
Song, Yi-Zhe .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14920-14929
[5]   Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].
Ch'ng, Chee Kheng ;
Chan, Chee Seng .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942
[6]   Scene Text Telescope: Text-Focused Scene Image Super-Resolution [J].
Chen, Jingye ;
Li, Bin ;
Xue, Xiangyang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12021-12030
[7]   AON: Towards Arbitrarily-Oriented Text Recognition [J].
Cheng, Zhanzhan ;
Xu, Yangliu ;
Bai, Fan ;
Niu, Yi ;
Pu, Shiliang ;
Zhou, Shuigeng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5571-5579
[8]   Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].
Cheng, Zhanzhan ;
Bai, Fan ;
Xu, Yunlu ;
Zheng, Gang ;
Pu, Shiliang ;
Zhou, Shuigeng .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094
[9]   Levenshtein OCR [J].
Da, Cheng ;
Wang, Peng ;
Yao, Cong .
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 :322-338
[10]  
Du YK, 2022, PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, P884