LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

被引:0
|
作者
Li, Yu [1 ,2 ,3 ]
Wei, Hongxi [1 ,2 ,3 ]
Sun, Shiwen [1 ,2 ,3 ]
机构
[1] Inner Mongolia Univ, Sch Comp Sci, Hohhot 010010, Peoples R China
[2] Prov Key Lab Mongolian Informat Proc Technol, Hohhot 010010, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Mongolian Informa, Hohhot 010010, Peoples R China
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Mongolian handwritten text recognition; BiLSTM; Local aggregation; Transformer; ATTENTION;
D O I
10.1007/978-3-031-70536-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mongolian handwritten text recognition poses challenges with the unique characteristics of Mongolian script, its large vocabulary, and the presence of out-of-vocabulary (OOV) words. This paper proposes a model that uses local aggregation BiLSTM for sequence modeling of visual features and Transformer for word prediction. Specifically, we introduce a local aggregation operation in BiLSTM (Bidirectional Long and Short Term Memory) to improve contextual understanding by aggregating adjacent information at each time step. The improved BiLSTM is able to capture context-dependent and letter shape changes that occur in different contexts. It effectively addresses the difficulty of accurately identifying variable letters and generating OOV words without relying on predefined words during training. The contextual features extracted by BiLSTM are passed through multiple layers of Transformer's encoder and decoder. At each layer, the representations of the previous layer are accessible, allowing layered representations to be refined and improved. By using hierarchical representations, accurate predictions can be made even in large vocabulary text recognition tasks. Our proposed model achieves state-of-the-art performance on two commonly used Mongolian handwritten text recognition datasets.
引用
收藏
页码:352 / 363
页数:12
相关论文
共 42 条
  • [31] Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model
    Hoesen, Devin
    Putri, Fanda Yuliana
    Lestari, Dessi Puji
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 7 - 12
  • [32] SC-NER: A Sequence-to-Sequence Model with Sentence Classification for Named Entity Recognition
    Wang, Yu
    Li, Yun
    Zhu, Ziye
    Xia, Bin
    Liu, Zheng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 198 - 209
  • [33] A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS
    Pan, Junjie
    Yin, Xiang
    Zhang, Zhiling
    Liu, Shichao
    Zhang, Yang
    Ma, Zejun
    Wang, Yuxuan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6689 - 6693
  • [34] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
    Mun'im, Raden Mu'az
    Inoue, Nakamasa
    Shinoda, Koichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6151 - 6155
  • [35] Bio-K-Transformer: A pre-trained transformer-based sequence-to-sequence model for adverse drug reactions prediction
    Qiu, Xihe
    Shao, Siyue
    Wang, Haoyu
    Tan, Xiaoyu
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 260
  • [36] Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning
    Tassopoulou, Vasiliki
    Retsinas, George
    Maragos, Petros
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10555 - 10560
  • [37] Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    INTERSPEECH 2019, 2019, : 1308 - 1312
  • [38] ET-Network: A novel efficient transformer deep learning model for automated Urdu handwritten text recognition
    Hamza, Ameer
    Ren, Shengbing
    Saeed, Usman
    PLOS ONE, 2024, 19 (05):
  • [39] Scene Text Recognition Using Permutated Autoregressive Sequence Model and YOLOv8
    Ari, Berna Gurler
    Comert, Zafer
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [40] Log Sequence Anomaly Detection Based on Local Information Extraction and Globally Sparse Transformer Model
    Zhang, Chunkai
    Wang, Xinyu
    Zhang, Hongye
    Zhang, Hanyu
    Han, Peiyi
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (04): : 4119 - 4133