C3-STISR: Scene Text Image Super-resolution with Triple Clues

被引：0

作者：

Zhao, Minyi ^{[1
,2
,3
]}

Wang, Miao ^{[3
]}

Bai, Fan ^{[1
,2
]}

Li, Bingjia ^{[1
,2
]}

Wang, Jie ^{[3
]}

Zhou, Shuigeng ^{[1
,2
]}

机构：

[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200438, Peoples R China

[2] Fudan Univ, Sch Comp Sci, Shanghai 200438, Peoples R China

[3] ByteDance, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022 | 2022年

关键词：

NETWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition from low-resolution scene text images. Most recent approaches use the recognizer's feedback as clue to guide super-resolution. However, directly using recognition clue has two problems: 1) Compatibility. It is in the form of probability distribution, has an obvious modal gap with STISR - a pixel-level task; 2) Inaccuracy. it usually contains wrong information, thus will mislead the main task and degrade super-resolution performance. In this paper, we present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution. Here, visual clue is from the images of texts predicted by the recognizer, which is informative and more compatible with the STISR task; while linguistical clue is generated by a pre-trained characterlevel language model, which is able to correct the predicted texts. We design effective extraction and fusion mechanisms for the triple cross-modal clues to generate a comprehensive and unified guidance for super-resolution. Extensive experiments on TextZoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. Code is available in https://github.com/zhaominyiz/C3-STISR.

引用

页码：1707 / 1713

页数：7

共 27 条

[1]

[Anonymous], 2019, CVPR

[2]

[Anonymous], 2018, ARXIV181202475

[3]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01132

[4]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00702

[5]

[Anonymous], 2018, ECCV

[6]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01185

[7]

Chen Jingye, 2021, ARXIV211208171

[8] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[9] Image Super-Resolution Using Deep Convolutional Networks [J].

Dong, Chao ;

Loy, Chen Change ;

He, Kaiming ;

Tang, Xiaoou .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307

[10] TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention [J].

Fang, Chuantao ;

Zhu, Yu ;

Liao, Lei ;

Ling, Xiaofeng .

NEUROCOMPUTING, 2021, 455 :88-96

← 1 2 3 →