JS']JSTR: Judgment Improves Scene Text Recognition

被引:0
作者
Fujitake, Masato [1 ]
机构
[1] Fast Accounting Co Ltd, FA Res, Tokyo, Japan
来源
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, INTELLISYS 2024 | 2024年 / 1065卷
关键词
Scene text recognition; Computer vision; Machine learning;
D O I
10.1007/978-3-031-66329-1_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other. While previous studies focused on generating the recognition results from input images, our approach also considers the model's misrecognition results to understand its error tendencies, thus improving the text recognition pipeline. This method boosts text recognition accuracy by providing explicit feedback on the data that the model is likely to misrecognize by predicting correct or incorrect between the image and text. The experimental results on publicly available datasets demonstrate that our proposed method outperforms the baseline and state-of-the-art methods in scene text recognition.
引用
收藏
页码:178 / 187
页数:10
相关论文
共 39 条
  • [1] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
    Baek, Jeonghun
    Matsui, Yusuke
    Aizawa, Kiyoharu
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3112 - 3121
  • [2] Scene Text Recognition with Permuted Autoregressive Sequence Models
    Bautista, Darwin
    Atienza, Rowel
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 178 - 196
  • [3] Chee Kheng Chng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1571, DOI 10.1109/ICDAR.2019.00252
  • [4] Dosovitskiy Alexey, 2020, P 8 INT C LEARN REPR
  • [5] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
    Fang, Shancheng
    Xie, Hongtao
    Wang, Yuxin
    Mao, Zhendong
    Zhang, Yongdong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7094 - 7103
  • [6] Fujitake Masato, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10096434
  • [7] RL-LOGO: DEEP REINFORCEMENT LEARNING LOCALIZATION FOR LOGO RECOGNITION
    Fujitake, Masato
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2830 - 2834
  • [8] Fujitake M, 2024, Arxiv, DOI arXiv:2403.14252
  • [9] DIFFUSIONSTR: DIFFUSION MODEL FOR SCENE TEXT RECOGNITION
    Fujitake, Masato
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1585 - 1589
  • [10] Temporally-aware Convolutional Block Attention Module for Video Text Detection
    Fujitake, Masato
    Ge, Hongpeng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 220 - 225