Pragmatic degradation learning for scene text image super-resolution with data-training strategy

被引:5
作者
Yang, Shengying [1 ]
Xie, Lifeng [1 ]
Ran, Xiaoxiao [2 ]
Lei, Jingsheng [1 ]
Qian, Xiaohong [1 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] COMAC Shanghai Aircraft Mfg Co Ltd, 5G Innovat Ctr, Shanghai 200120, Peoples R China
基金
中国国家自然科学基金;
关键词
Text image super-resolution; Scene text recognition; Deep neural network; Degradation; NETWORK;
D O I
10.1016/j.knosys.2023.111349
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Super -resolution of scene text images represents a formidable computational problem, marred by a myriad of intricate challenges. This paper focuses on the specific hurdles that have impeded significant advancements in this domain, and introduces the Higher -Order Degradation -Based Super -Resolution Network (HDSN) as a novel solution to address these intricate issues. The challenges in super -resolving scene text images are manifold. Firstly, the semantic ambiguity inherent to text in natural scenes often leads to degraded results, as standard super -resolution techniques struggle to preserve meaningful textual content. Additionally, the uncertainty surrounding font variability exacerbates this issue, as different fonts require distinct treatment for optimal super -resolution. Furthermore, scene text images often exhibit long trailing shadows, artifacts, and strong noise, rendering conventional methods inadequate in producing satisfactory results. To tackle these intricate challenges, we propose a pragmatic higher -order degradation modeling process. This process takes into account the nuanced characteristics of scene text images, including the diverse forms of noise such as Gaussian, Poisson, speckle, and JPEG compression noise, as well as varying levels of blurring. By meticulously considering these real -world scenarios, our approach significantly enhances the robustness and adaptability of super -resolution for scene text images. In addition to addressing these challenges, we recognize the issues arising from sparse datasets and the lack of corresponding paired images for training. To surmount this limitation, we introduce a text image pre -training strategy, which proves to be highly effective in improving recognition accuracy. The experimental results on TextZoom affirm the effectiveness of our approach, demonstrating substantial improvements over existing methods. Notably, our HDSN achieves average recognition rates of 67.2% on ASTER, 63.2% on MORAN, and 58.0% on CRNN, surpassing the performance of available approaches. Our source code is available at https://github.com/syyang2022/HDSN.
引用
收藏
页数:14
相关论文
共 68 条
[1]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[2]   Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].
Cai, Jianrui ;
Zeng, Hui ;
Yong, Hongwei ;
Cao, Zisheng ;
Zhang, Lei .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095
[3]   Masked Image Training for Generalizable Deep Image Denoising [J].
Chen, Haoyu ;
Gu, Jinjin ;
Liu, Yihao ;
Magid, Salma Abdel ;
Dong, Chao ;
Wang, Qiong ;
Pfister, Hanspeter ;
Zhu, Lei .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :1692-1703
[4]  
Chen HY, 2021, Arxiv, DOI arXiv:2104.09497
[5]  
Chen Haoyu, 2023, P IEEE CVF INT C COM, P13211
[6]   Real-world single image super-resolution: A brief review [J].
Chen, Honggang ;
He, Xiaohai ;
Qing, Linbo ;
Wu, Yuanyuan ;
Ren, Chao ;
Sheriff, Ray E. ;
Zhu, Ce .
INFORMATION FUSION, 2022, 79 :124-145
[7]   Scene Text Telescope: Text-Focused Scene Image Super-Resolution [J].
Chen, Jingye ;
Li, Bin ;
Xue, Xiangyang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12021-12030
[8]   Randaugment: Practical automated data augmentation with a reduced search space [J].
Cubuk, Ekin D. ;
Zoph, Barret ;
Shlens, Jonathon ;
Le, Quoc, V .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :3008-3017
[9]  
Dong C, 2015, Arxiv, DOI [arXiv:1506.02211, DOI 10.48550/ARXIV.1506.02211]
[10]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307