Chinese text recognition enhanced by glyph and character semantic information

被引：4

作者：

Wu, Shilian ^{[1
]}

Li, Yongrui ^{[1
]}

Wang, Zengfu ^{[2
]}

机构：

[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230000, Anhui, Peoples R China

[2] Univ Sci & Technol China, Inst Intelligent Machines, Hefei 230000, Anhui, Peoples R China

来源：

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION | 2024年 / 27卷 / 01期

关键词：

Chinese text recognition; Vision and language; Transformer;

D O I：

10.1007/s10032-023-00444-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Chinese text line recognition technology has been applied in a variety of scenarios. As a kind of ideographic writing, Chinese characters contain plenty of semantic information and basic components. While previous methods mainly convert each Chinese character into a discrete label to facilitate the calculation of cross-entropy loss, leaving the fine-grained glyph information (e.g. strokes and radicals) and semantic information unexploited. Concretely, glyph information is crucial for recognizing Chinese characters with similar appearances, as these characters differ only slightly in local strokes. The glyph information reflects these differences guiding the model to learn fine-grained local features. And compared to discrete category labels, the character semantic information introduces diverse visual concepts, which enriches the final character representation. This paper presents a Chinese text recognition method that exploits glyph and character semantic information to acquire effective text representations. Specifically, we propose a Glyph-Aware Decoder to identify characters by dynamically fusing the global visual features with the local stroke and radical features. And we introduce a Contrastive Visual-Textual Learning module to enhance the visual features of Chinese characters by their semantic information. Experiments show that our proposed model achieves state-of-the-art results on the Chinese text recognition benchmarks.

引用

页码：45 / 56

页数：12

共 55 条

[1]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]

[2]

Chee Kheng Chng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1571, DOI 10.1109/ICDAR.2019.00252

[3]

Chen H.C., 2003, Reading Development in Chinese Children, P157

[4]

Chen J., 2021, INT JOINT C ARTIFICI

[5]

Chen J., 2021, SCENE TEXT TELESCOPE

[6]

Chen Jieneng, 2021, arXiv

[7]

Chu XX, 2021, ADV NEUR IN

[8] An end-to-end network for irregular printed Mongolian recognition [J].

Cui, ShaoDong ;

Su, YiLa ;

Ji, Ren Qing dao er ;

Ji, YaTu .

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (01) :41-50

[9]

Deng Yu, 2020, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

[10]

Diaz D.H., 2021, COMPUTER VISION PATT

← 1 2 3 4 5 6 →