Batch-transformer for scene text image super-resolution

被引:2
作者
Sun, Yaqi [1 ,3 ]
Xie, Xiaolan [1 ,2 ]
Li, Zhi [1 ]
Yang, Kai [3 ]
机构
[1] Guangxi Normal Univ, Sch Comp Sci & Engn, Guilin, Guangxi, Peoples R China
[2] Guilin Univ Technol, Sch Informat Sci & Engn, Guilin, Guangxi, Peoples R China
[3] Hengyang Normal Univ, Sch Comp Sci & Technol, Hengyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer vision; Super-resolution; Scene text image; Batch-transformer; Loss function; NETWORK;
D O I
10.1007/s00371-024-03598-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recognizing low-resolution text images is challenging as they often lose their detailed information, leading to poor recognition accuracy. Moreover, the traditional methods, based on deep convolutional neural networks (CNNs), are not effective enough for some low-resolution text images with dense characters. In this paper, a novel CNN-based batch-transformer network for scene text image super-resolution (BT-STISR) method is proposed to address this problem. In order to obtain the text information for text reconstruction, a pre-trained text prior module is employed to extract text information. Then a novel two pipeline batch-transformer-based module is proposed, leveraging self-attention and global attention mechanisms to exert the guidance of text prior to the text reconstruction process. Experimental study on a benchmark dataset TextZoom shows that the proposed method BT-STISR achieves the best state-of-the-art performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) metrics compared to some latest methods.
引用
收藏
页码:7399 / 7409
页数:11
相关论文
共 30 条
[1]   Scene Text Telescope: Text-Focused Scene Image Super-Resolution [J].
Chen, Jingye ;
Li, Bin ;
Xue, Xiangyang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12021-12030
[2]   Single image super-resolution based on trainable feature matching attention network [J].
Chen, Qizhou ;
Shao, Qing .
PATTERN RECOGNITION, 2024, 149
[3]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307
[4]  
Han D., 2013, P 2 INT C COMP SCI E, V10, DOI DOI 10.2991/ICCSEE.2013.391
[5]   Words Matter: Scene Text for Image Classification and Retrieval [J].
Karaoglu, Sezer ;
Tao, Ran ;
Gevers, Theo ;
Smeulders, Arnold W. M. .
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (05) :1063-1076
[6]   Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network [J].
Ledig, Christian ;
Theis, Lucas ;
Huszar, Ferenc ;
Caballero, Jose ;
Cunningham, Andrew ;
Acosta, Alejandro ;
Aitken, Andrew ;
Tejani, Alykhan ;
Totz, Johannes ;
Wang, Zehan ;
Shi, Wenzhe .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :105-114
[7]   Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting [J].
Li, Gen ;
Ji, Jie ;
Qin, Minghai ;
Niu, Wei ;
Ren, Bin ;
Afghah, Fatemeh ;
Guo, Linke ;
Ma, Xiaolong .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10259-10269
[8]   Learning Generative Structure Prior for Blind Text Image Super-resolution [J].
Li, Xiaoming ;
Zuo, Wangmeng ;
Loy, Chen Change .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10103-10113
[9]   Scene Text Detection and Recognition: The Deep Learning Era [J].
Long, Shangbang ;
He, Xin ;
Yao, Cong .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (01) :161-184
[10]   MORAN: A Multi-Object Rectified Attention Network for scene text recognition [J].
Luo, Canjie ;
Jin, Lianwen ;
Sun, Zenghui .
PATTERN RECOGNITION, 2019, 90 :109-118