Scene Text Recognition with Transformer using Multi-patches

被引：0

作者：

Wang Y. ^{[1
]}

Ha J.-E. ^{[2
]}

机构：

[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology

[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology

来源：

Journal of Institute of Control, Robotics and Systems | 2022年 / 28卷 / 10期

关键词：

Deep learning; Scene text recognition; Transformer;

D O I：

10.5302/J.ICROS.2022.22.0107

中图分类号：

学科分类号：

摘要：

In this paper, we explore the application of Vision transformer (ViT) to the scene text recognition task. As a popular research direction in computer vision, Scene text recognition enables computers to recognize or read the text in natural scenes, such as object labels, text descriptions, and road text signs. At present, the traditional convolutional neural network-based model has better performance. Still, in the face of complex backgrounds and irregular scene text pictures, the performance of the convolutional neural network-based model is challenging to improve in curved text, diverse fonts, distortions, etc. With the application of transformers in computer vision, the model structure based on transformers has also significantly been developed. Although the current transformer-based model can obtain the performance of the model structure similar to CNN, it is currently in the early stage of application, and there is much room for research and improvement. We propose a multi-scale vertical rectangular patch model (MSVSTR) for transformer-based feature extractor to be more suitable for text images. By only arranging the patches in a single direction, when the image is cropped through the patch, it can be more suitable for the distribution form of the text in the text image. At the same time, to be suitable for different numbers of characters in other texts and more robust feature extraction, vertical rectangular patches of different scales are applied to crop the image. Our structure performs better through various ablation experiments than similar transformer-based STR models. At the same time, experiments show that our structure can perform seven benchmarks well. © ICROS 2022.

引用

页码：862 / 867

页数：5

共 50 条

[21] MASTER: Multi-aspect non-local network for scene text recognition
Lu, Ning
Yu, Wenwen
Qi, Xianbiao
Chen, Yihao
Gong, Ping
Xiao, Rong
Bai, Xiang
PATTERN RECOGNITION, 2021, 117
[22] DIFFUSIONSTR: DIFFUSION MODEL FOR SCENE TEXT RECOGNITION
Fujitake, Masato
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1585 - 1589
[23] Dual Relation Network for Scene Text Recognition
Li, Ming
Fu, Bin
Chen, Han
He, Junjun
Qiao, Yu
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4094 - 4107
[24] Adaptive Adversarial Attack on Scene Text Recognition
Yuan, Xiaoyong
He, Pan
Li, Xiaolin
Wu, Dapeng
IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2020, : 358 - 363
[25] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
Wang, Yuxin
Xie, Hongtao
Fang, Shancheng
Xing, Mengting
Wang, Jing
Zhu, Shenggao
Zhang, Yongdong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
[26] Representative Batch Normalization for Scene Text Recognition
Sun, Yajie
Cao, Xiaoling
Sun, Yingying
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (07): : 2390 - 2406
[27] CATALIST: CAmera TrAnsformations for Multi-LIngual Scene Text Recognition
Sood, Shivam
Saluja, Rohit
Ramakrishnan, Ganesh
Chaudhuri, Parag
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 213 - 228
[28] SCENE TEXT RECOGNITION MODELS EXPLAINABILITY USING LOCAL FEATURES
Ty, Mark Vincent
Atienza, Rowel
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 645 - 649
[29] SCENE TEXT RECOGNITION USING SPARSE CODING BASED FEATURES
Zhang, Dong
Wang, Da-Han
Wang, Hanzi
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1066 - 1070
[30] A Vision Transformer Based Scene Text Recognizer with Multi-grained Encoding and Decoding
Qiao, Zhi
Ji, Zhilong
Yuan, Ye
Bai, Jinfeng
FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 198 - 212

← 1 2 3 4 5 →