TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention

被引:19
作者
Fang, Chuantao [1 ]
Zhu, Yu [1 ]
Liao, Lei [1 ]
Ling, Xiaofeng [1 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
基金
上海市自然科学基金;
关键词
Text image super-resolution; Adversarial learning; Triplet attention; Wavelet loss; Scene text recognition; NEURAL-NETWORK; SCENE;
D O I
10.1016/j.neucom.2021.05.060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text in a low-resolution (LR) image is usually hard to read. Super-resolution (SR) is an intuitive solution to this issue. Existing single image super-resolution (SISR) models are mainly trained on synthetic datasets whose LR images are obtained by performing bicubic interpolation or gaussian blur on high-resolution (HR) images. However, these models can hardly generalize to practical scenarios because real-world LR images are more difficult to super-resolve. The newly proposed TextZoom dataset is the first dataset for real-world text image super-resolution. We propose a new model termed TSRGAN trained on this dataset. First, a discriminator is designed to prevent the SR network from generating over-smoothed images. Second, we introduce triplet attention into the SR network for better representational ability. Moreover, besides L-2 loss and adversarial loss, wavelet loss is incorporated to help reconstruct sharper character edges. Since TextZoom provides text labels, the recognition accuracy of scene text recognition (STR) model can be used to evaluate the quality of SR images. It can reflect the performance of text image SR models better than traditional SR evaluation metrics such as PSNR and SSIM. Comprehensive experiments show the superiority of our TSRGAN. Compared with the state-of-the-art method, the proposed TSRGAN improves the average recognition accuracy of ASTER, MORAN and CRNN by 0.8%, 1.5% and 3.2% on TextZoom respectively. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:88 / 96
页数:9
相关论文
共 47 条
  • [1] [Anonymous], 2017, P IEEE C COMP VIS PA
  • [2] [Anonymous], 2008, 2008 IEEE C COMP VIS
  • [3] [Anonymous], 2018, P EUR C COMP VIS MUN
  • [4] Densely convolutional attention network for image super-resolution
    Bai, Furui
    Lu, Wen
    Huang, Yuanfei
    Zha, Lin
    Yang, Jiachen
    [J]. NEUROCOMPUTING, 2019, 368 : 25 - 33
  • [5] Ben Niu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P191, DOI 10.1007/978-3-030-58610-2_12
  • [6] Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding
    Bevilacqua, Marco
    Roumy, Aline
    Guillemot, Christine
    Morel, Marie-Line Alberi
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
  • [7] Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model
    Cai, Jianrui
    Zeng, Hui
    Yong, Hongwei
    Cao, Zisheng
    Zhang, Lei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3086 - 3095
  • [8] Second-order Attention Network for Single Image Super-Resolution
    Dai, Tao
    Cai, Jianrui
    Zhang, Yongbing
    Xia, Shu-Tao
    Zhang, Lei
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11057 - 11066
  • [9] Image Super-Resolution Using Deep Convolutional Networks
    Dong, Chao
    Loy, Chen Change
    He, Kaiming
    Tang, Xiaoou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) : 295 - 307
  • [10] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672