Chinese Character Recognition based on Swin Transformer-Encoder ☆

被引：0

作者：

Li, Ziying ^{[1
]}

Zhao, Haifeng ^{[1
,2
]}

Nishizaki, Hiromitsu ^{[2
]}

Leow, Chee Siang ^{[2
]}

Shen, Xingfa ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Comp Sci & Technol, Hangzhou 310018, Peoples R China

[2] Univ Yamanashi, Comp Sci & Technol, Kofu, Yamanashi 4000013, Japan

来源：

DIGITAL SIGNAL PROCESSING | 2025年 / 161卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

OCR; Chinese Text Recognition; Swin Transformer; Attention; Image segmentation; STROKE EXTRACTION; NETWORK;

D O I：

10.1016/j.dsp.2025.105080

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Optical Character Recognition (OCR) technology, which converts printed or handwritten text into machinereadable text, holds significant application and research value in document digitization, information automation, and multilingual support. However, existing methods predominantly focus on English text recognition and often struggle with addressing the complexities of Chinese characters. This study proposes a Chinese text recognition model based on the Swin Transformer encoder, demonstrating its remarkable adaptability to Chinese character recognition. In the image preprocessing stage, we introduced an overlapping segmentation technique that enables the encoder to effectively capture the complex structural relationships between individual strokes in lengthy Chinese texts. Additionally, by incorporating a mapping layer between the encoder and decoder, we enhanced the Swin Transformer's adaptability to small image scenarios, thereby improving its feasibility for Chinese text recognition tasks. Experimental results indicate that this model outperforms classical models such as CRNN and ASTER on handwritten and web-based datasets, validating its robustness and reliability.

引用

页数：10

共 38 条

[1] Kurdish Handwritten character recognition using deep learning techniques
Ahmed, Rebin M.
Rashid, Tarik A.
Fattah, Polla
Alsadoon, Abeer
Bacanin, Nebojsa
Mirjalili, Seyedali
Vimal, S.
Chhabra, Amit
[J]. GENE EXPRESSION PATTERNS, 2022, 46
[2] Zero-shot Handwritten Chinese Character Recognition with hierarchical decomposition embedding
Cao, Zhong
Lu, Jiang
Cui, Sen
Zhang, Changshui
[J]. PATTERN RECOGNITION, 2020, 107
[3] Chen J., 2021, P 30 INT JOINT C ART
[4] Scene Text Telescope: Text-Focused Scene Image Super-Resolution
Chen, Jingye
Li, Bin
Xue, Xiangyang
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12021 - 12030
[5] Stroke-Based Autoencoders: Self-Supervised Learners for Efficient Zero-Shot Chinese Character Recognition
Chen, Zongze
Yang, Wenxia
Li, Xin
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
[6] FINDING STRUCTURE IN TIME
ELMAN, JL
[J]. COGNITIVE SCIENCE, 1990, 14 (02) : 179 - 211
[7] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Fang, Shancheng
Xie, Hongtao
Wang, Yuxin
Mao, Zhendong
Zhang, Yongdong
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7094 - 7103
[8] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[9] He MC, 2018, INT C PATT RECOG, P7, DOI 10.1109/ICPR.2018.8546143
[10] SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Huang, Mingxin
Liu, Yuliang
Peng, Zhenghao
Liu, Chongyu
Lin, Dahua
Zhu, Shenggao
Yuan, Nicholas
Ding, Kai
Jin, Lianwen
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4583 - 4593

← 1 2 3 4 →