LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer

被引:0
|
作者
Li, Yu [1 ,2 ,3 ]
Wei, Hongxi [1 ,2 ,3 ]
Sun, Shiwen [1 ,2 ,3 ]
机构
[1] Inner Mongolia Univ, Sch Comp Sci, Hohhot 010010, Peoples R China
[2] Prov Key Lab Mongolian Informat Proc Technol, Hohhot 010010, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Mongolian Informa, Hohhot 010010, Peoples R China
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Mongolian handwritten text recognition; BiLSTM; Local aggregation; Transformer; ATTENTION;
D O I
10.1007/978-3-031-70536-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mongolian handwritten text recognition poses challenges with the unique characteristics of Mongolian script, its large vocabulary, and the presence of out-of-vocabulary (OOV) words. This paper proposes a model that uses local aggregation BiLSTM for sequence modeling of visual features and Transformer for word prediction. Specifically, we introduce a local aggregation operation in BiLSTM (Bidirectional Long and Short Term Memory) to improve contextual understanding by aggregating adjacent information at each time step. The improved BiLSTM is able to capture context-dependent and letter shape changes that occur in different contexts. It effectively addresses the difficulty of accurately identifying variable letters and generating OOV words without relying on predefined words during training. The contextual features extracted by BiLSTM are passed through multiple layers of Transformer's encoder and decoder. At each layer, the representations of the previous layer are accessible, allowing layered representations to be refined and improved. By using hierarchical representations, accurate predictions can be made even in large vocabulary text recognition tasks. Our proposed model achieves state-of-the-art performance on two commonly used Mongolian handwritten text recognition datasets.
引用
收藏
页码:352 / 363
页数:12
相关论文
共 42 条
  • [1] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
    Hrinchuk, Oleksii
    Popova, Mariya
    Ginsburg, Boris
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078
  • [2] Effective offline handwritten text recognition model based on a sequence-to-sequence approach with CNN-RNN networks
    Geetha, R.
    Thilagam, T.
    Padmavathy, T.
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (17): : 10923 - 10934
  • [3] Sequence-to-Sequence Contrastive Learning for Text Recognition
    Aberdam, Aviad
    Litman, Ron
    Tsiper, Shahar
    Anschel, Oron
    Slossberg, Ron
    Mazor, Shai
    Manmatha, R.
    Perona, Pietro
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15297 - 15307
  • [4] Retraction Note: Effective offline handwritten text recognition model based on a sequence-to-sequence approach with CNN–RNN networks
    R. Geetha
    T. Thilagam
    T. Padmavathy
    Neural Computing and Applications, 2024, 36 (24) : 15227 - 15227
  • [5] Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism
    Baro, Arnau
    Badal, Carles
    Fornes, Alicia
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 205 - 210
  • [6] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
  • [7] A Sequence-to-Sequence Framework Based on Transformer With Masked Language Model for Optical Music Recognition
    Wen, Cuihong
    Zhu, Longjiao
    IEEE ACCESS, 2022, 10 : 118243 - 118252
  • [8] Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
    Long, Yinghan
    Chowdhury, Sayeed Shafayet
    Roy, Kaushik
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8325 - 8337
  • [9] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
    Shahamiri, Seyed Reza
    Lal, Vanshika
    Shah, Dhvani
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
  • [10] Candidate fusion: Integrating language modelling into a sequence-to-sequence handwritten word recognition architecture
    Kang, Lei
    Riba, Pau
    Villegas, Mauricio
    Fornes, Alicia
    Rusinol, Marcal
    PATTERN RECOGNITION, 2021, 112