Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

被引：0

作者：

Chen, Jingye ^{[1
]}

Yu, Haiyang ^{[1
]}

Ma, Jianqi ^{[2
]}

Li, Bin ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China

[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

RECOGNITION; NETWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the last decade, the blossom of deep learning has witnessed the rapid development of scene text recognition. However, the recognition of low-resolution scene text images remains a challenge. Even though some super-resolution methods have been proposed to tackle this problem, they usually treat text images as general images while ignoring the fact that the visual quality of strokes (the atomic unit of text) plays an essential role for text recognition. According to Gestalt Psychology, humans are capable of composing parts of details into the most similar objects guided by prior knowledge. Likewise, when humans observe a low-resolution text image, they will inherently use partial stroke-level details to recover the appearance of holistic characters. Inspired by Gestalt Psychology, we put forward a Stroke-Aware Scene Text Image Super-Resolution method containing a Stroke-Focused Module (SFM) to concentrate on stroke-level internal structures of characters in text images. Specifically, we attempt to design rules for decomposing English characters and digits at stroke-level, then pre-train a text recognizer to provide stroke-level attention maps as positional clues with the purpose of controlling the consistency between the generated super-resolution image and high-resolution ground truth. The extensive experimental results validate that the proposed method can indeed generate more distinguishable images on Text-Zoom and manually constructed Chinese character dataset Degraded-IC13. Furthermore, since the proposed SFM is only used to provide stroke-level guidance when training, it will not bring any time overhead during the test phase.

引用

页码：285 / 293

页数：9

共 42 条

[1]

[Anonymous], 2006, P 23 INT C MACH LEAR, DOI 10.1145/1143844.1143891

[2] Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].

Cai, Jianrui ;

Zeng, Hui ;

Yong, Hongwei ;

Cao, Zisheng ;

Zhang, Lei .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095

[3]

Capel D, 2000, INT C PATT RECOG, P600, DOI 10.1109/ICPR.2000.905409

[4]

Chen J., 2021, IJCAI

[5] Scene Text Telescope: Text-Focused Scene Image Super-Resolution [J].

Chen, Jingye ;

Li, Bin ;

Xue, Xiangyang .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12021-12030

[6] Text Recognition in the Wild: A Survey [J].

Chen, Xiaoxue ;

Jin, Lianwen ;

Zhu, Yuanzhi ;

Luo, Canjie ;

Wang, Tianwei .

ACM COMPUTING SURVEYS, 2021, 54 (02)

[7] Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].

Cheng, Zhanzhan ;

Bai, Fan ;

Xu, Yunlu ;

Zheng, Gang ;

Pu, Shiliang ;

Zhou, Shuigeng .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094

[8]

Dalley G, 2004, IEEE IMAGE PROC, P3295

[9] Learning a Deep Convolutional Network for Image Super-Resolution [J].

Dong, Chao ;

Loy, Chen Change ;

He, Kaiming ;

Tang, Xiaoou .

COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 :184-199

[10]

Fenfen Sheng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P781, DOI 10.1109/ICDAR.2019.00130

← 1 2 3 4 5 →