LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

被引:0
|
作者
Li, Yu [1 ,2 ,3 ]
Wei, Hongxi [1 ,2 ,3 ]
Sun, Shiwen [1 ,2 ,3 ]
机构
[1] Inner Mongolia Univ, Sch Comp Sci, Hohhot 010010, Peoples R China
[2] Prov Key Lab Mongolian Informat Proc Technol, Hohhot 010010, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Mongolian Informa, Hohhot 010010, Peoples R China
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Mongolian handwritten text recognition; BiLSTM; Local aggregation; Transformer; ATTENTION;
D O I
10.1007/978-3-031-70536-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mongolian handwritten text recognition poses challenges with the unique characteristics of Mongolian script, its large vocabulary, and the presence of out-of-vocabulary (OOV) words. This paper proposes a model that uses local aggregation BiLSTM for sequence modeling of visual features and Transformer for word prediction. Specifically, we introduce a local aggregation operation in BiLSTM (Bidirectional Long and Short Term Memory) to improve contextual understanding by aggregating adjacent information at each time step. The improved BiLSTM is able to capture context-dependent and letter shape changes that occur in different contexts. It effectively addresses the difficulty of accurately identifying variable letters and generating OOV words without relying on predefined words during training. The contextual features extracted by BiLSTM are passed through multiple layers of Transformer's encoder and decoder. At each layer, the representations of the previous layer are accessible, allowing layered representations to be refined and improved. By using hierarchical representations, accurate predictions can be made even in large vocabulary text recognition tasks. Our proposed model achieves state-of-the-art performance on two commonly used Mongolian handwritten text recognition datasets.
引用
收藏
页码:352 / 363
页数:12
相关论文
共 42 条
  • [21] BART-IT: An Efficient Sequence-to-Sequence Model for Italian Text Summarization
    La Quatra, Moreno
    Cagliero, Luca
    FUTURE INTERNET, 2023, 15 (01)
  • [22] SMILE: SEQUENCE-TO-SEQUENCE DOMAIN ADAPTATION WITH MINIMIZING LATENT ENTROPY FOR TEXT IMAGE RECOGNITION
    Chang, Yen-Cheng
    Chen, Yi-Chang
    Chang, Yu-Chuan
    Yeh, Yi-Ren
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 431 - 435
  • [23] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
  • [24] Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
    Azim, Mona A.
    Hussein, Wedad
    Badr, Nagwa L.
    IEEE ACCESS, 2023, 11 : 91173 - 91183
  • [25] Attention Combination of Sequence Models for Handwritten Chinese Text Recognition
    Zhu, Zheng-Yu
    Yin, Fei
    Wang, Da-Han
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 288 - 294
  • [26] Automatic Recognition of Identification Schemes for IoT Identifiers via Sequence-to-Sequence Model
    Li, Xiaotao
    You, Shujuan
    Chen, Wai
    2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2020,
  • [27] MITIGATING THE IMPACT OF SPEECH RECOGNITION ERRORS ON CHATBOT USING SEQUENCE-TO-SEQUENCE MODEL
    Chen, Pin-Jung
    Hsu, I-Hung
    Huang, Yi-Yao
    Lee, Hung-Yi
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 497 - 503
  • [28] A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization
    Wang, Li
    Yao, Junlin
    Tao, Yunzhe
    Zhong, Li
    Liu, Wei
    Du, Qiang
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4453 - 4460
  • [29] DPF-S2S: A novel dual-pathway-fusion-based sequence-to-sequence text recognition model
    Zhang, Yuqing
    Wu, Peishu
    Li, Han
    Liu, Yurong
    Alsaadi, Fuad E.
    Zeng, Nianyin
    NEUROCOMPUTING, 2023, 523 : 182 - 190
  • [30] A Sequence-to-Sequence Transformer Model for Satellite Retrieval of Aerosol Optical and Microphysical Parameters from Space
    Zhang, Luo
    Gu, Haoran
    Li, Zhengqiang
    Liu, Zhenhai
    Zhang, Ying
    Xie, Yisong
    Zhang, Zihan
    Ji, Zhe
    Li, Zhiyu
    Yan, Chaoyu
    REMOTE SENSING, 2024, 16 (24)