A Feature Refinement Patch Embedding-Based Recognition Method for Printed Tibetan Cursive Script

被引：0

作者：

Zhi, Cai Rang Dang ^{[1
,2
,3
]}

Huang, Heming ^{[1
,2
,3
]}

Fan, Yonghong ^{[1
,2
,3
]}

Song, Dongke ^{[1
,2
,3
]}

机构：

[1] Qinghai Normal Univ, Sch Comp Sci & Technol, Xining 810008, Peoples R China

[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China

[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II | 2024年 / 14426卷

基金：

美国国家科学基金会;

关键词：

Tibetan recognition; cursive scripts; feature refinement patch embedding; Transformer; TEXT; TRANSFORMER;

D O I：

10.1007/978-981-99-8432-9_31

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognition of Tibetan cursive scripts has important applications in the field of automated Tibetan office software and ancient document conservation. However, there are few studies on recognition of Tibetan cursive scripts. This paper proposes a printed Tibetan cursive script recognition method based on feature refinement patch embedding. Firstly, the feature refinement patch embedding module (FRPE) serializes the line text image of feature sequences. Secondly, a global modeling of feature vectors is carried out by using a single transformer encoder. Finally, the output of the recognition result is decoded by using a fully connected layer. Experimental results show that, compared with the baseline model, the proposed method improves the accuracy by 9.52% on the dataset CSTPD, a database containing six Tibetan cursive fonts. Moreover, it achieves an average accuracy rate of 92.5% on the dataset CSTPD. Similarly, it also works better than the baseline model on Tibetan text recognition synthetic data for natural scene images.

引用

页码：383 / 399

页数：17

共 25 条

[1] Vision Transformer for Fast and Efficient Scene Text Recognition
Atienza, Rowel
[J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334
[2] Text Recognition in the Wild: A Survey
Chen, Xiaoxue
Jin, Lianwen
Zhu, Yuanzhi
Luo, Canjie
Wang, Tianwei
[J]. ACM COMPUTING SURVEYS, 2021, 54 (02)
[3] Chen Y., 2020, Design and Implementation of Printed Tibetan Language Recognition Software on Android Platform
[4] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[5] Du Y., 2022, 31 INT JOINT C ARTIF, P12593
[6] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Fang, Shancheng
Xie, Hongtao
Wang, Yuxin
Mao, Zhendong
Zhang, Yongdong
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7094 - 7103
[7] Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition
Fu, Zilong
Xie, Hongtao
Jin, Guoqing
Guo, Junbo
[J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 638 - 644
[8] SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Huang, Mingxin
Liu, Yuliang
Peng, Zhenghao
Liu, Chongyu
Lin, Dahua
Zhu, Shenggao
Yuan, Nicholas
Ding, Kai
Jin, Lianwen
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4583 - 4593
[9] Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Kittenplon, Yair
Lavi, Inbal
Fogel, Sharon
Bar, Yarin
Manmatha, R.
Perona, Pietro
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4594 - 4603
[10] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90

← 1 2 3 →