Batch-transformer for scene text image super-resolution

被引:2
作者
Sun, Yaqi [1 ,3 ]
Xie, Xiaolan [1 ,2 ]
Li, Zhi [1 ]
Yang, Kai [3 ]
机构
[1] Guangxi Normal Univ, Sch Comp Sci & Engn, Guilin, Guangxi, Peoples R China
[2] Guilin Univ Technol, Sch Informat Sci & Engn, Guilin, Guangxi, Peoples R China
[3] Hengyang Normal Univ, Sch Comp Sci & Technol, Hengyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer vision; Super-resolution; Scene text image; Batch-transformer; Loss function; NETWORK;
D O I
10.1007/s00371-024-03598-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recognizing low-resolution text images is challenging as they often lose their detailed information, leading to poor recognition accuracy. Moreover, the traditional methods, based on deep convolutional neural networks (CNNs), are not effective enough for some low-resolution text images with dense characters. In this paper, a novel CNN-based batch-transformer network for scene text image super-resolution (BT-STISR) method is proposed to address this problem. In order to obtain the text information for text reconstruction, a pre-trained text prior module is employed to extract text information. Then a novel two pipeline batch-transformer-based module is proposed, leveraging self-attention and global attention mechanisms to exert the guidance of text prior to the text reconstruction process. Experimental study on a benchmark dataset TextZoom shows that the proposed method BT-STISR achieves the best state-of-the-art performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) metrics compared to some latest methods.
引用
收藏
页码:7399 / 7409
页数:11
相关论文
共 30 条
[21]  
Wang TW, 2020, AAAI CONF ARTIF INTE, V34, P12216
[22]   Scene Text Image Super-Resolution in the Wild [J].
Wang, Wenjia ;
Xie, Enze ;
Liu, Xuebo ;
Wang, Wenhai ;
Liang, Ding ;
Shen, Chunhua ;
Bai, Xiang .
COMPUTER VISION - ECCV 2020, PT X, 2020, 12355 :650-666
[23]   TTST: A Top-k Token Selective Transformer for Remote Sensing Image Super-Resolution [J].
Xiao, Yi ;
Yuan, Qiangqiang ;
Jiang, Kui ;
He, Jiang ;
Lin, Chia-Wen ;
Zhang, Liangpei .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :738-752
[24]   Pragmatic degradation learning for scene text image super-resolution with data-training strategy [J].
Yang, Shengying ;
Xie, Lifeng ;
Ran, Xiaoxiao ;
Lei, Jingsheng ;
Qian, Xiaohong .
KNOWLEDGE-BASED SYSTEMS, 2024, 285
[25]   T-spline Surface Fairing Based on Centripetal Re-parameterization [J].
Yu, Lin ;
He, Chuan ;
Tan, Weixiao ;
Xue, Yutong ;
Zhao, Gang ;
Wang, Aizeng .
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 :1-8
[26]  
Zhang Wenlong, 2024, Advances in Neural Information Processing Systems, V36
[27]   Scene Text Image Super-Resolution via Parallelly Contextual Attention Network [J].
Zhao, Cairong ;
Feng, Shuyang ;
Zhao, Brian Nlong ;
Ding, Zhijun ;
Wu, Jun ;
Shen, Fuming ;
Shen, Heng Tao .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :2908-2917
[28]   STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition [J].
Zhao, Minyi ;
Xuyang, Shijie ;
Guan, Jihong ;
Zhou, Shuigeng .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :7530-7539
[29]  
Zhu Z., 2024, IEEE Trans. Artif. Intell.
[30]   SwinT-SRNet: Swin transformer with image super-resolution reconstruction network for pollen images classification [J].
Zu, Baokai ;
Cao, Tong ;
Li, Yafang ;
Li, Jianqiang ;
Ju, Fujiao ;
Wang, Hongyuan .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133