Scene Text Recognition with Permuted Autoregressive Sequence Models

被引:78
|
作者
Bautista, Darwin [1 ]
Atienza, Rowel [1 ]
机构
[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines
来源
COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷
关键词
Scene text recognition; Permutation language modeling; Autoregressive modeling; Cross-modal attention; Transformer;
D O I
10.1007/978-3-031-19815-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.
引用
收藏
页码:178 / 196
页数:19
相关论文
共 50 条
  • [1] Masked and Permuted Implicit Context Learning for Scene Text Recognition
    Yang, Xiaomeng
    Qiao, Zhi
    Wei, Jin
    Yang, Dongbao
    Zhou, Yu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 964 - 968
  • [2] Scene Text Recognition Using Permutated Autoregressive Sequence Model and YOLOv8
    Ari, Berna Gurler
    Comert, Zafer
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [3] Display-Semantic Transformer for Scene Text Recognition
    Yang, Xinqi
    Silamu, Wushour
    Xu, Miaomiao
    Li, Yanbing
    SENSORS, 2023, 23 (19)
  • [4] SCENE TEXT RECOGNITION MODELS EXPLAINABILITY USING LOCAL FEATURES
    Ty, Mark Vincent
    Atienza, Rowel
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 645 - 649
  • [5] Scene Text Recognition with Multi-Encoders
    Wang, Yao
    Ha, Jong-Eun
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1615 - 1620
  • [6] Lightweight Scene Text Recognition Based on Transformer
    Luan, Xin
    Zhang, Jinwei
    Xu, Miaomiao
    Silamu, Wushouer
    Li, Yanbing
    SENSORS, 2023, 23 (09)
  • [7] LCSTR: Scene Text Recognition with Large Convolutional Kernels
    Wang, Jiale
    Yang, Lina
    Wang, Jing
    Yang, Haoyan
    Bai, Lin
    Wang, Patrick Shen-Pei
    Li, Xichun
    Lu, Huiwu
    Xu, Huafu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [8] Pure Transformer with Integrated Experts for Scene Text Recognition
    Tan, Yew Lee
    Kong, Adams Wai-Kin
    Kim, Jung-Jae
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 481 - 497
  • [9] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [10] Vision Transformer for Fast and Efficient Scene Text Recognition
    Atienza, Rowel
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334