Single-Character-Based Embedding Feature Aggregation Using Cross-Attention for Scene Text Super-Resolution

被引:0
作者
Wang, Meng [1 ]
Li, Qianqian [1 ]
Liu, Haipeng [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
scene text image super-resolution; cross-attention; cross-fertilization; text recognition; IMAGE SUPERRESOLUTION; RECOGNITION; NETWORK;
D O I
10.3390/s25072228
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In textual vision scenarios, super-resolution aims to enhance textual quality and readability to facilitate downstream tasks. However, the ambiguity of character regions in complex backgrounds remains challenging to mitigate, particularly the interference between tightly connected characters. In this paper, we propose single-character-based embedding feature aggregation using cross-attention for scene text super-resolution (SCE-STISR) to solve this problem. Firstly, a dynamic feature extraction mechanism is employed to adaptively capture shallow features by dynamically adjusting multi-scale feature weights based on spatial representations. During text-image interactions, a dual-level cross-attention mechanism is introduced to comprehensively aggregate the cropped single-character features with textual prior, also aligning semantic sequences and visual features. Finally, an adaptive normalized color correction operation is applied to mitigate color distortion caused by background interference. In TextZoom benchmarking, the text recognition accuracies of different recognizers are 53.6%, 60.9%, and 64.5%, which are improved by 0.9-1.4% over the baseline TATT, achieving an optimal SSIM value of 0.7951 and a PSNR of 21.84. Additionally, our approach improves accuracy by 0.2-2.2% over existing baselines on five text recognition datasets, validating the effectiveness of the model.
引用
收藏
页数:23
相关论文
共 53 条
[1]  
Akhtar P, 2010, INT CONF BIOINFORM
[2]  
Badran YK, 2020, NAT RADIO SCI CO, P128, DOI [10.1109/nrsc49500.2020.9235100, 10.1109/NRSC49500.2020.9235100]
[3]  
Chen JY, 2022, AAAI CONF ARTIF INTE, P285
[4]   Scene Text Telescope: Text-Focused Scene Image Super-Resolution [J].
Chen, Jingye ;
Li, Bin ;
Xue, Xiangyang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12021-12030
[5]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307
[6]   Learning a Deep Convolutional Network for Image Super-Resolution [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 :184-199
[7]   ADAPTIVE IMAGE SUPER-RESOLUTION ALGORITHM BASED ON FRACTIONAL FOURIER TRANSFORM [J].
Faramarzi, Ahmad ;
Ahmadyfard, Alireza ;
Khosravi, Hossein .
IMAGE ANALYSIS & STEREOLOGY, 2022, 41 (02) :133-144
[8]  
Fu MH, 2023, Arxiv, DOI arXiv:2306.02443
[9]  
Guo H, 2023, Arxiv, DOI [arXiv:2307.09749, 10.48550/arXiv.2307.09749S, DOI 10.48550/ARXIV.2307.09749S]
[10]   Self-supervised memory learning for scene text image super-resolution [J].
Guo, Kehua ;
Zhu, Xiangyuan ;
Schaefer, Gerald ;
Ding, Rui ;
Fang, Hui .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258