TextDiff: Enhancing scene text image super-resolution with mask-guided residual diffusion models

被引:0
作者
Liu, Baolin [1 ]
Yang, Zongyuan [1 ]
Chiu, Chinwai [1 ]
Xiong, Yongping [1 ]
机构
[1] Beijing Univ Post & Telecommun, State Key Lab Switching & Networking Technol, Beijing 100876, Peoples R China
关键词
Scene text image super-resolution; Text enhancement; Diffusion model; Multi-stage learning; Model expandability; NETWORK;
D O I
10.1016/j.patcog.2025.111513
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of scene text image super-resolution (STISR) is to reconstruct high-resolution text-line images from unrecognizable low-resolution inputs. The existing methods relying on the optimization of pixel-level loss tend to yield text edges that exhibit a notable degree of blurring, thereby exerting a substantial impact on both the readability and recognizability of the text. To address these issues, we propose TextDiff, the first diffusion-based framework tailored for STISR. It contains two modules: the Text Enhancement Module (TEM) and the Mask- Guided Residual Diffusion Module (MRD). The TEM generates an initial deblurred text image and a mask that encodes the spatial location of the text. The MRD is responsible for effectively sharpening the text edge by modeling the residuals between the ground-truth images and the initial deblurred images. Extensive experiments demonstrate that our TextDiff achieves state-of-the-art (SOTA) performance on public benchmark datasets, with a maximum improvement of 2.0% in recognition accuracy over existing methods while enhancing the readability of scene text images. Moreover, our proposed MRD module is plug-and-play that effectively sharpens the text edges produced by SOTA methods. This enhancement not only improves the readability and recognizability of the results generated by SOTA methods but also does not require any additional joint training.
引用
收藏
页数:14
相关论文
共 42 条
  • [1] The Perception-Distortion Tradeoff
    Blau, Yochai
    Michaeli, Tomer
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6228 - 6237
  • [2] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
    Chen, Dave Zhenyu
    Gholami, Ali
    Niesner, Matthias
    Chang, Angel X.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3192 - 3202
  • [3] Chen JY, 2022, AAAI CONF ARTIF INTE, P285
  • [4] Scene Text Telescope: Text-Focused Scene Image Super-Resolution
    Chen, Jingye
    Li, Bin
    Xue, Xiangyang
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12021 - 12030
  • [5] Activating More Pixels in Image Super-Resolution Transformer
    Chen, Xiangyu
    Wang, Xintao
    Zhou, Jiantao
    Qiao, Yu
    Dong, Chao
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22367 - 22377
  • [6] Dual Aggregation Transformer for Image Super-Resolution
    Chen, Zheng
    Zhang, Yulun
    Gu, Jinjin
    Kong, Linghe
    Yang, Xiaokang
    Yu, Fisher
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12278 - 12287
  • [7] Levenshtein OCR
    Da, Cheng
    Wang, Peng
    Yao, Cong
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 322 - 338
  • [8] Image Super-Resolution Using Deep Convolutional Networks
    Dong, Chao
    Loy, Chen Change
    He, Kaiming
    Tang, Xiaoou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) : 295 - 307
  • [9] Learning a Deep Convolutional Network for Image Super-Resolution
    Dong, Chao
    Loy, Chen Change
    He, Kaiming
    Tang, Xiaoou
    [J]. COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 : 184 - 199
  • [10] Self-supervised memory learning for scene text image super-resolution
    Guo, Kehua
    Zhu, Xiangyuan
    Schaefer, Gerald
    Ding, Rui
    Fang, Hui
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258