Chinese text recognition enhanced by glyph and character semantic information

被引:4
作者
Wu, Shilian [1 ]
Li, Yongrui [1 ]
Wang, Zengfu [2 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230000, Anhui, Peoples R China
[2] Univ Sci & Technol China, Inst Intelligent Machines, Hefei 230000, Anhui, Peoples R China
关键词
Chinese text recognition; Vision and language; Transformer;
D O I
10.1007/s10032-023-00444-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese text line recognition technology has been applied in a variety of scenarios. As a kind of ideographic writing, Chinese characters contain plenty of semantic information and basic components. While previous methods mainly convert each Chinese character into a discrete label to facilitate the calculation of cross-entropy loss, leaving the fine-grained glyph information (e.g. strokes and radicals) and semantic information unexploited. Concretely, glyph information is crucial for recognizing Chinese characters with similar appearances, as these characters differ only slightly in local strokes. The glyph information reflects these differences guiding the model to learn fine-grained local features. And compared to discrete category labels, the character semantic information introduces diverse visual concepts, which enriches the final character representation. This paper presents a Chinese text recognition method that exploits glyph and character semantic information to acquire effective text representations. Specifically, we propose a Glyph-Aware Decoder to identify characters by dynamically fusing the global visual features with the local stroke and radical features. And we introduce a Contrastive Visual-Textual Learning module to enhance the visual features of Chinese characters by their semantic information. Experiments show that our proposed model achieves state-of-the-art results on the Chinese text recognition benchmarks.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 55 条
[11]  
Dosovitskiy A., 2021, arXiv
[12]  
Feng Z., 2019, ASIAN C PATTERN RECO
[13]  
Graves A., 2006, P 23 INT C MACHINE L, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]
[14]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[15]  
He MC, 2018, INT C PATT RECOG, P7, DOI 10.1109/ICPR.2018.8546143
[16]   Channel Modelling of Molecular Communications Across Blood Vessels and Nerves [J].
He, Peng ;
Mao, Yuming ;
Liu, Qiang ;
Lio, Pietro ;
Yang, Kun .
2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
[17]  
Kohler W, 1967, Psychol Forsch, V31, P18
[18]  
Kuang Z., 2021, ARXIV
[19]  
Lee J., 2020, COMPUTER VISION PATT
[20]  
Li H, 2019, AAAI CONF ARTIF INTE, P8610