Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

被引：0

作者：

Guo, Hang ^{[1
]}

Dai, Tao ^{[2
]}

Meng, Guanghao ^{[1
,3
]}

Xia, Shu-Tao ^{[1
,3
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

RECOGNITION; NETWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text image super-resolution (STISR), aiming to improve image quality while boosting down-stream scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, and neglect the disturbance from the complex background, thus limiting the performance. To address these issues, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on the attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which is then incorporated into the super-resolution branch in an adaptive manner using the proposed adaptive fusion module. Experiments on TextZoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods. Code is available at https://github.com/csguoh/LEMMA.

引用

页码：782 / 790

页数：9

共 41 条

[1] Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].

Cai, Jianrui ;

Zeng, Hui ;

Yong, Hongwei ;

Cao, Zisheng ;

Zhang, Lei .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095

[2]

Chen Jieneng, 2021, Computer vision and pattern recognition

[3]

Chen JY, 2022, AAAI CONF ARTIF INTE, P285

[4] Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].

Cheng, Zhanzhan ;

Bai, Fan ;

Xu, Yunlu ;

Zheng, Gang ;

Pu, Shiliang ;

Zhou, Shuigeng .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094

[5] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[6] Second-order Attention Network for Single Image Super-Resolution [J].

Dai, Tao ;

Cai, Jianrui ;

Zhang, Yongbing ;

Xia, Shu-Tao ;

Zhang, Lei .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11057-11066

[7]

Dong C, 2015, Arxiv, DOI arXiv:1506.02211

[8] Image Super-Resolution Using Deep Convolutional Networks [J].

Dong, Chao ;

Loy, Chen Change ;

He, Kaiming ;

Tang, Xiaoou .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307

[9] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [J].

Fang, Shancheng ;

Xie, Hongtao ;

Wang, Yuxin ;

Mao, Zhendong ;

Zhang, Yongdong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7094-7103

[10]

Liem HD, 2018, PROCEEDINGS OF 2018 5TH NAFOSTED CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS 2018), P338, DOI 10.1109/NICS.2018.8606831

← 1 2 3 4 5 →