C3-STISR: Scene Text Image Super-resolution with Triple Clues

被引:0
作者
Zhao, Minyi [1 ,2 ,3 ]
Wang, Miao [3 ]
Bai, Fan [1 ,2 ]
Li, Bingjia [1 ,2 ]
Wang, Jie [3 ]
Zhou, Shuigeng [1 ,2 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200438, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai 200438, Peoples R China
[3] ByteDance, Beijing, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022 | 2022年
关键词
NETWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition from low-resolution scene text images. Most recent approaches use the recognizer's feedback as clue to guide super-resolution. However, directly using recognition clue has two problems: 1) Compatibility. It is in the form of probability distribution, has an obvious modal gap with STISR - a pixel-level task; 2) Inaccuracy. it usually contains wrong information, thus will mislead the main task and degrade super-resolution performance. In this paper, we present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution. Here, visual clue is from the images of texts predicted by the recognizer, which is informative and more compatible with the STISR task; while linguistical clue is generated by a pre-trained characterlevel language model, which is able to correct the predicted texts. We design effective extraction and fusion mechanisms for the triple cross-modal clues to generate a comprehensive and unified guidance for super-resolution. Extensive experiments on TextZoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. Code is available in https://github.com/zhaominyiz/C3-STISR.
引用
收藏
页码:1707 / 1713
页数:7
相关论文
共 27 条
[1]  
[Anonymous], 2019, CVPR
[2]  
[Anonymous], 2018, ARXIV181202475
[3]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01132
[4]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00702
[5]  
[Anonymous], 2018, ECCV
[6]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01185
[7]  
Chen Jingye, 2021, ARXIV211208171
[8]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[9]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307
[10]   TSRGAN: Real-world text image super-resolution based on adversarial learning and triplet attention [J].
Fang, Chuantao ;
Zhu, Yu ;
Liao, Lei ;
Ling, Xiaofeng .
NEUROCOMPUTING, 2021, 455 :88-96