TIRec: Transformer-based Invoice Text Recognition

被引:0
|
作者
Chen, Yanlan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Text recognition; Invoice; Convolutional Vision Transformer;
D O I
10.1145/3590003.3590034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.
引用
收藏
页码:175 / 180
页数:6
相关论文
共 50 条
  • [41] Exploring Wearable Emotion Recognition with Transformer-Based Continual Learning
    Rizza, Federica
    Bellitto, Giovanni
    Calcagno, Salvatore
    Palazzo, Simone
    ARTIFICIAL INTELLIGENCE IN PANCREATIC DISEASE DETECTION AND DIAGNOSIS, AND PERSONALIZED INCREMENTAL LEARNING IN MEDICINE, AIPAD 2024, PILM 2024, 2025, 15197 : 86 - 101
  • [42] Transformer-Based Approaches for Legal Text ProcessingJNLP Team - COLIEE 2021
    Ha-Thanh Nguyen
    Minh-Phuong Nguyen
    Thi-Hai-Yen Vuong
    Minh-Quan Bui
    Minh-Chau Nguyen
    Tran-Binh Dang
    Vu Tran
    Le-Minh Nguyen
    Ken Satoh
    The Review of Socionetwork Strategies, 2022, 16 : 135 - 155
  • [43] Transformer-based structuring of free-text radiology report databases
    S. Nowak
    D. Biesner
    Y. C. Layer
    M. Theis
    H. Schneider
    W. Block
    B. Wulff
    U. I. Attenberger
    R. Sifa
    A. M. Sprinkart
    European Radiology, 2023, 33 : 4228 - 4236
  • [44] Transformer-based network with temporal depthwise convolutions for sEMG recognition
    Wang, Zefeng
    Yao, Junfeng
    Xu, Meiyan
    Jiang, Min
    Su, Jinsong
    PATTERN RECOGNITION, 2024, 145
  • [45] ERTNet: an interpretable transformer-based framework for EEG emotion recognition
    Liu, Ruixiang
    Chao, Yihu
    Ma, Xuerui
    Sha, Xianzheng
    Sun, Limin
    Li, Shuo
    Chang, Shijie
    FRONTIERS IN NEUROSCIENCE, 2024, 18
  • [46] A Transformer-based Radical Analysis Network for Chinese Character Recognition
    Yang, Chen
    Wang, Qing
    Du, Jun
    Zhang, Jianshu
    Wu, Changjie
    Wang, Jiaming
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3714 - 3719
  • [47] Classification and recognition of gesture EEG signals with Transformer-Based models
    Qu, Yan
    Li, Congsheng
    Jiang, Haoyu
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 415 - 418
  • [48] The MERSA Dataset and a Transformer-Based Approach for Speech Emotion Recognition
    Zhang, Enshi
    Trujillo, Rafael
    Poellabauer, Christian
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 13960 - 13970
  • [49] Transformer-based structuring of free-text radiology report databases
    Nowak, S.
    Biesner, D.
    Layer, Y. C.
    Theis, M.
    Schneider, H.
    Block, W.
    Wulff, B.
    Attenberger, U. I.
    Sifa, R.
    Sprinkart, A. M.
    EUROPEAN RADIOLOGY, 2023, 33 (06) : 4228 - 4236
  • [50] Transformer-Based Bidirectional Encoder Representations for Emotion Detection from Text
    Kumar, Ashok J.
    Cambria, Erik
    Trueman, Tina Esther
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,