JS']JSTR: Judgment Improves Scene Text Recognition

被引：0

作者：

Fujitake, Masato ^{[1
]}

机构：

[1] Fast Accounting Co Ltd, FA Res, Tokyo, Japan

来源：

INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, INTELLISYS 2024 | 2024年 / 1065卷

关键词：

Scene text recognition; Computer vision; Machine learning;

D O I：

10.1007/978-3-031-66329-1_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other. While previous studies focused on generating the recognition results from input images, our approach also considers the model's misrecognition results to understand its error tendencies, thus improving the text recognition pipeline. This method boosts text recognition accuracy by providing explicit feedback on the data that the model is likely to misrecognize by predicting correct or incorrect between the image and text. The experimental results on publicly available datasets demonstrate that our proposed method outperforms the baseline and state-of-the-art methods in scene text recognition.

引用

页码：178 / 187

页数：10

共 39 条

[1] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
Baek, Jeonghun
Matsui, Yusuke
Aizawa, Kiyoharu
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3112 - 3121
[2] Scene Text Recognition with Permuted Autoregressive Sequence Models
Bautista, Darwin
Atienza, Rowel
[J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 178 - 196
[3] Chee Kheng Chng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1571, DOI 10.1109/ICDAR.2019.00252
[4] Dosovitskiy Alexey, 2020, P 8 INT C LEARN REPR
[5] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Fang, Shancheng
Xie, Hongtao
Wang, Yuxin
Mao, Zhendong
Zhang, Yongdong
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7094 - 7103
[6] Fujitake Masato, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10096434
[7] RL-LOGO: DEEP REINFORCEMENT LEARNING LOCALIZATION FOR LOGO RECOGNITION
Fujitake, Masato
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2830 - 2834
[8] Fujitake M, 2024, Arxiv, DOI arXiv:2403.14252
[9] DIFFUSIONSTR: DIFFUSION MODEL FOR SCENE TEXT RECOGNITION
Fujitake, Masato
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1585 - 1589
[10] Temporally-aware Convolutional Block Attention Module for Video Text Detection
Fujitake, Masato
Ge, Hongpeng
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 220 - 225

← 1 2 3 4 →