Scene Text Recognition with Transformer using Multi-patches

被引:0
|
作者
Wang Y. [1 ]
Ha J.-E. [2 ]
机构
[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology
[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology
关键词
Deep learning; Scene text recognition; Transformer;
D O I
10.5302/J.ICROS.2022.22.0107
中图分类号
学科分类号
摘要
In this paper, we explore the application of Vision transformer (ViT) to the scene text recognition task. As a popular research direction in computer vision, Scene text recognition enables computers to recognize or read the text in natural scenes, such as object labels, text descriptions, and road text signs. At present, the traditional convolutional neural network-based model has better performance. Still, in the face of complex backgrounds and irregular scene text pictures, the performance of the convolutional neural network-based model is challenging to improve in curved text, diverse fonts, distortions, etc. With the application of transformers in computer vision, the model structure based on transformers has also significantly been developed. Although the current transformer-based model can obtain the performance of the model structure similar to CNN, it is currently in the early stage of application, and there is much room for research and improvement. We propose a multi-scale vertical rectangular patch model (MSVSTR) for transformer-based feature extractor to be more suitable for text images. By only arranging the patches in a single direction, when the image is cropped through the patch, it can be more suitable for the distribution form of the text in the text image. At the same time, to be suitable for different numbers of characters in other texts and more robust feature extraction, vertical rectangular patches of different scales are applied to crop the image. Our structure performs better through various ablation experiments than similar transformer-based STR models. At the same time, experiments show that our structure can perform seven benchmarks well. © ICROS 2022.
引用
收藏
页码:862 / 867
页数:5
相关论文
共 50 条
  • [41] A Multi-level Progressive Rectification Mechanism for Irregular Scene Text Recognition
    Liao, Qianying
    Lin, Qingxiang
    Jin, Lianwen
    Luo, Canjie
    Zhang, Jiaxin
    Peng, Dezhi
    Wang, Tianwei
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 140 - 155
  • [42] An end-to-end model for multi-view scene text recognition
    Banerjee, Ayan
    Shivakumara, Palaiahnakote
    Bhattacharya, Saumik
    Pal, Umapada
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2024, 149
  • [43] HMM-based Multi Oriented Text Recognition in Natural Scene Image
    Roy, Sangheeta
    Roy, Partha Pratim
    Shivakumara, Palaiahnakote
    Louloudis, Georgios
    Tan, Chew Lim
    Pal, Umapada
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 288 - 292
  • [44] Emotion recognition in Hindi text using multilingual BERT transformer
    Kumar, Tapesh
    Mahrishi, Mehul
    Sharma, Girish
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42373 - 42394
  • [45] Scene Text Segmentation via Multi-Task Cascade Transformer With Paired Data Synthesis
    Dang, Quang-Vinh
    Lee, Guee-Sang
    IEEE ACCESS, 2023, 11 : 67791 - 67805
  • [46] A Graph-Transformer Network for Scene Text Detection
    Wu, Yongrong
    Lin, Jingyu
    Chen, Houjin
    Chen, Dinghao
    Yang, Lvqing
    Xiahou, Jianbing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 680 - 690
  • [47] Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation
    Weinman, Jerod J.
    Learned-Miller, Erik
    Hanson, Allen R.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (10) : 1733 - 1746
  • [48] A Flash Flood Categorization System using Scene-Text Recognition
    Basnyat, Bipendra
    Roy, Nirmalya
    Gangopadhyay, Aryya
    2018 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2018), 2018, : 147 - 154
  • [49] Scene text recognition using residual convolutional recurrent neural network
    Lei, Zhengchao
    Zhao, Sanyuan
    Song, Hongmei
    Shen, Jianbing
    MACHINE VISION AND APPLICATIONS, 2018, 29 (05) : 861 - 871
  • [50] QT-TextSR: Enhancing scene text image super-resolution via efficient interaction with text recognition using a Query-aware Transformer
    Liu, Chongyu
    Jiang, Qing
    Peng, Dezhi
    Kong, Yuxin
    Zhang, Jiaixin
    Xiong, Longfei
    Duan, Jiwei
    Sun, Cheng
    Jin, Lianwen
    NEUROCOMPUTING, 2025, 620