Scene Text Recognition with Transformer using Multi-patches

被引:0
|
作者
Wang Y. [1 ]
Ha J.-E. [2 ]
机构
[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology
[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology
关键词
Deep learning; Scene text recognition; Transformer;
D O I
10.5302/J.ICROS.2022.22.0107
中图分类号
学科分类号
摘要
In this paper, we explore the application of Vision transformer (ViT) to the scene text recognition task. As a popular research direction in computer vision, Scene text recognition enables computers to recognize or read the text in natural scenes, such as object labels, text descriptions, and road text signs. At present, the traditional convolutional neural network-based model has better performance. Still, in the face of complex backgrounds and irregular scene text pictures, the performance of the convolutional neural network-based model is challenging to improve in curved text, diverse fonts, distortions, etc. With the application of transformers in computer vision, the model structure based on transformers has also significantly been developed. Although the current transformer-based model can obtain the performance of the model structure similar to CNN, it is currently in the early stage of application, and there is much room for research and improvement. We propose a multi-scale vertical rectangular patch model (MSVSTR) for transformer-based feature extractor to be more suitable for text images. By only arranging the patches in a single direction, when the image is cropped through the patch, it can be more suitable for the distribution form of the text in the text image. At the same time, to be suitable for different numbers of characters in other texts and more robust feature extraction, vertical rectangular patches of different scales are applied to crop the image. Our structure performs better through various ablation experiments than similar transformer-based STR models. At the same time, experiments show that our structure can perform seven benchmarks well. © ICROS 2022.
引用
收藏
页码:862 / 867
页数:5
相关论文
共 50 条
  • [31] Scene Text Recognition Algorithm Based on Faster RCNN
    Wang, Boya
    Xu, Jianqing
    Li, Junbao
    Hu, Cong
    Pan, Jeng-Shyang
    PROCEEDINGS FIRST INTERNATIONAL CONFERENCE ON ELECTRONICS INSTRUMENTATION & INFORMATION SYSTEMS (EIIS 2017), 2017, : 805 - 808
  • [32] LCSTR: Scene Text Recognition with Large Convolutional Kernels
    Wang, Jiale
    Yang, Lina
    Wang, Jing
    Yang, Haoyan
    Bai, Lin
    Wang, Patrick Shen-Pei
    Li, Xichun
    Lu, Huiwu
    Xu, Huafu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [33] Synthetically Supervised Feature Learning for Scene Text Recognition
    Liu, Yang
    Wang, Zhaowen
    Jin, Hailin
    Wassell, Ian
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 449 - 465
  • [34] Scene Text Recognition with Permuted Autoregressive Sequence Models
    Bautista, Darwin
    Atienza, Rowel
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 178 - 196
  • [35] SNFR: salient neighbor decoding and text feature refining for scene text recognition
    Lu, Tongwei
    Fan, Huageng
    Chen, Yuqian
    Shao, Pengyan
    MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
  • [36] Emotion recognition in Hindi text using multilingual BERT transformer
    Tapesh Kumar
    Mehul Mahrishi
    Girish Sharma
    Multimedia Tools and Applications, 2023, 82 : 42373 - 42394
  • [37] Review network for scene text recognition
    Li, Shuohao
    Han, Anqi
    Chen, Xu
    Yin, Xiaoqing
    Zhang, Jun
    JOURNAL OF ELECTRONIC IMAGING, 2017, 26 (05)
  • [38] Scene text recognition: an Indic perspective
    Vijayan, Vasanthan P.
    Chanda, Sukalpa
    Doermann, David
    Krishnan, Narayanan C.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025, 28 (01) : 31 - 40
  • [39] Summary of Scene Text Detection and Recognition
    Qin, Yao
    Zhang, Zhi
    PROCEEDINGS OF THE 15TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2020), 2020, : 85 - 89
  • [40] Learning and Fusing Multi-Scale Representations for Accurate Arbitrary-Shaped Scene Text Recognition
    Li, Mingjun
    Xu, Shuo
    Su, Feng
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 353 - 361