LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

被引：0

作者：

Li, Yu ^{[1
,2
,3
]}

Wei, Hongxi ^{[1
,2
,3
]}

Sun, Shiwen ^{[1
,2
,3
]}

机构：

[1] Inner Mongolia Univ, Sch Comp Sci, Hohhot 010010, Peoples R China

[2] Prov Key Lab Mongolian Informat Proc Technol, Hohhot 010010, Peoples R China

[3] Natl & Local Joint Engn Res Ctr Mongolian Informa, Hohhot 010010, Peoples R China

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Mongolian handwritten text recognition; BiLSTM; Local aggregation; Transformer; ATTENTION;

D O I：

10.1007/978-3-031-70536-6_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mongolian handwritten text recognition poses challenges with the unique characteristics of Mongolian script, its large vocabulary, and the presence of out-of-vocabulary (OOV) words. This paper proposes a model that uses local aggregation BiLSTM for sequence modeling of visual features and Transformer for word prediction. Specifically, we introduce a local aggregation operation in BiLSTM (Bidirectional Long and Short Term Memory) to improve contextual understanding by aggregating adjacent information at each time step. The improved BiLSTM is able to capture context-dependent and letter shape changes that occur in different contexts. It effectively addresses the difficulty of accurately identifying variable letters and generating OOV words without relying on predefined words during training. The contextual features extracted by BiLSTM are passed through multiple layers of Transformer's encoder and decoder. At each layer, the representations of the previous layer are accessible, allowing layered representations to be refined and improved. By using hierarchical representations, accurate predictions can be made even in large vocabulary text recognition tasks. Our proposed model achieves state-of-the-art performance on two commonly used Mongolian handwritten text recognition datasets.

引用

页码：352 / 363

页数：12

共 42 条

[21] BART-IT: An Efficient Sequence-to-Sequence Model for Italian Text Summarization
La Quatra, Moreno
Cagliero, Luca
FUTURE INTERNET, 2023, 15 (01)
[22] SMILE: SEQUENCE-TO-SEQUENCE DOMAIN ADAPTATION WITH MINIMIZING LATENT ENTROPY FOR TEXT IMAGE RECOGNITION
Chang, Yen-Cheng
Chen, Yi-Chang
Chang, Yu-Chuan
Yeh, Yi-Ren
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 431 - 435
[23] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
Li, Bo
Sainath, Tara N.
Sim, Khe Chai
Bacchiani, Michiel
Weinstein, Eugene
Nguyen, Patrick
Chen, Zhifeng
Wu, Yonghui
Rao, Kanishka
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
[24] Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
Azim, Mona A.
Hussein, Wedad
Badr, Nagwa L.
IEEE ACCESS, 2023, 11 : 91173 - 91183
[25] Attention Combination of Sequence Models for Handwritten Chinese Text Recognition
Zhu, Zheng-Yu
Yin, Fei
Wang, Da-Han
2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 288 - 294
[26] Automatic Recognition of Identification Schemes for IoT Identifiers via Sequence-to-Sequence Model
Li, Xiaotao
You, Shujuan
Chen, Wai
2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2020,
[27] MITIGATING THE IMPACT OF SPEECH RECOGNITION ERRORS ON CHATBOT USING SEQUENCE-TO-SEQUENCE MODEL
Chen, Pin-Jung
Hsu, I-Hung
Huang, Yi-Yao
Lee, Hung-Yi
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 497 - 503
[28] A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization
Wang, Li
Yao, Junlin
Tao, Yunzhe
Zhong, Li
Liu, Wei
Du, Qiang
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4453 - 4460
[29] DPF-S2S: A novel dual-pathway-fusion-based sequence-to-sequence text recognition model
Zhang, Yuqing
Wu, Peishu
Li, Han
Liu, Yurong
Alsaadi, Fuad E.
Zeng, Nianyin
NEUROCOMPUTING, 2023, 523 : 182 - 190
[30] A Sequence-to-Sequence Transformer Model for Satellite Retrieval of Aerosol Optical and Microphysical Parameters from Space
Zhang, Luo
Gu, Haoran
Li, Zhengqiang
Liu, Zhenhai
Zhang, Ying
Xie, Yisong
Zhang, Zihan
Ji, Zhe
Li, Zhiyu
Yan, Chaoyu
REMOTE SENSING, 2024, 16 (24)

← 1 2 3 4 5 →