TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention

被引:19
作者
Fang, Chuantao [1 ]
Zhu, Yu [1 ]
Liao, Lei [1 ]
Ling, Xiaofeng [1 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
基金
上海市自然科学基金;
关键词
Text image super-resolution; Adversarial learning; Triplet attention; Wavelet loss; Scene text recognition; NEURAL-NETWORK; SCENE;
D O I
10.1016/j.neucom.2021.05.060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text in a low-resolution (LR) image is usually hard to read. Super-resolution (SR) is an intuitive solution to this issue. Existing single image super-resolution (SISR) models are mainly trained on synthetic datasets whose LR images are obtained by performing bicubic interpolation or gaussian blur on high-resolution (HR) images. However, these models can hardly generalize to practical scenarios because real-world LR images are more difficult to super-resolve. The newly proposed TextZoom dataset is the first dataset for real-world text image super-resolution. We propose a new model termed TSRGAN trained on this dataset. First, a discriminator is designed to prevent the SR network from generating over-smoothed images. Second, we introduce triplet attention into the SR network for better representational ability. Moreover, besides L-2 loss and adversarial loss, wavelet loss is incorporated to help reconstruct sharper character edges. Since TextZoom provides text labels, the recognition accuracy of scene text recognition (STR) model can be used to evaluate the quality of SR images. It can reflect the performance of text image SR models better than traditional SR evaluation metrics such as PSNR and SSIM. Comprehensive experiments show the superiority of our TSRGAN. Compared with the state-of-the-art method, the proposed TSRGAN improves the average recognition accuracy of ASTER, MORAN and CRNN by 0.8%, 1.5% and 3.2% on TextZoom respectively. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:88 / 96
页数:9
相关论文
共 47 条
  • [31] ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
    Shi, Baoguang
    Yang, Mingkun
    Wang, Xinggang
    Lyu, Pengyuan
    Yao, Cong
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (09) : 2035 - 2048
  • [32] An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
    Shi, Baoguang
    Bai, Xiang
    Yao, Cong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) : 2298 - 2304
  • [33] Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
    Shi, Wenzhe
    Caballero, Jose
    Huszar, Ferenc
    Totz, Johannes
    Aitken, Andrew P.
    Bishop, Rob
    Rueckert, Daniel
    Wang, Zehan
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1874 - 1883
  • [34] NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results
    Timofte, Radu
    Agustsson, Eirikur
    Van Gool, Luc
    Yang, Ming-Hsuan
    Zhang, Lei
    Lim, Bee
    Son, Sanghyun
    Kim, Heewon
    Nah, Seungjun
    Lee, Kyoung Mu
    Wang, Xintao
    Tian, Yapeng
    Yu, Ke
    Zhang, Yulun
    Wu, Shixiang
    Dong, Chao
    Lin, Liang
    Qiao, Yu
    Loy, Chen Change
    Bae, Woong
    Yoo, Jaejun
    Han, Yoseob
    Ye, Jong Chul
    Choi, Jae-Seok
    Kim, Munchurl
    Fan, Yuchen
    Yu, Jiahui
    Han, Wei
    Liu, Ding
    Yu, Haichao
    Wang, Zhangyang
    Shi, Honghui
    Wang, Xinchao
    Huang, Thomas S.
    Chen, Yunjin
    Zhang, Kai
    Zuo, Wangmeng
    Tang, Zhimin
    Luo, Linkai
    Li, Shaohui
    Fu, Min
    Cao, Lei
    Heng, Wen
    Bui, Giang
    Truc Le
    Duan, Ye
    Tao, Dacheng
    Wang, Ruxin
    Lin, Xu
    Pang, Jianxin
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1110 - 1121
  • [35] Wang K, 2011, IEEE I CONF COMP VIS, P1457, DOI 10.1109/ICCV.2011.6126402
  • [36] Wang K, 2010, LECT NOTES COMPUT SC, V6311, P591, DOI 10.1007/978-3-642-15549-9_43
  • [37] Wang W., ARXIV PREPRINT ARXIV
  • [38] ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
    Wang, Xintao
    Yu, Ke
    Wu, Shixiang
    Gu, Jinjin
    Liu, Yihao
    Dong, Chao
    Qiao, Yu
    Loy, Chen Change
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT V, 2019, 11133 : 63 - 79
  • [39] TEXT-ATTENTIONAL CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR SUPER-RESOLUTION OF TEXT IMAGES
    Wang, Yuyang
    Su, Feng
    Qian, Ye
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1024 - 1029
  • [40] Wenjia Wang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12355), P650, DOI 10.1007/978-3-030-58607-2_38