Scene Text Recognition with Permuted Autoregressive Sequence Models

被引:78
|
作者
Bautista, Darwin [1 ]
Atienza, Rowel [1 ]
机构
[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines
来源
COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷
关键词
Scene text recognition; Permutation language modeling; Autoregressive modeling; Cross-modal attention; Transformer;
D O I
10.1007/978-3-031-19815-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.
引用
收藏
页码:178 / 196
页数:19
相关论文
共 50 条
  • [31] HIERARCHICAL REFINED ATTENTION FOR SCENE TEXT RECOGNITION
    Zhang, Min
    Ma, Meng
    Wang, Ping
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4175 - 4179
  • [32] Adaptive Adversarial Attack on Scene Text Recognition
    Yuan, Xiaoyong
    He, Pan
    Li, Xiaolin
    Wu, Dapeng
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2020, : 358 - 363
  • [33] Triggered Attention Model for Scene Text Recognition
    Zhang, Churong
    Ming, Yue
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [34] Scene Text Recognition with Cascade Attention Network
    Zhang, Min
    Ma, Meng
    Wang, Ping
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 385 - 393
  • [35] A Feature Learning Method for Scene Text Recognition
    Ho Vu Duong
    Quoc Ngoc Ly
    2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 176 - 180
  • [36] Representative Batch Normalization for Scene Text Recognition
    Sun, Yajie
    Cao, Xiaoling
    Sun, Yingying
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (07): : 2390 - 2406
  • [37] Scene Text Recognition with Multi-decoders
    Wang, Yao
    Ha, Jong-Eun
    2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021,
  • [38] SCENE TEXT RECOGNITION WITH TEMPORAL CONVOLUTIONAL ENCODER
    Du, Xiangcheng
    Ma, Tianlong
    Zheng, Yingbin
    Ye, Hao
    Wu, Xingjiao
    He, Liang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2383 - 2387
  • [39] An extended attention mechanism for scene text recognition
    Xiao, Zheng
    Nie, Zhenyu
    Song, Chao
    Chronopoulos, Anthony Theodore
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [40] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
    Orihashi, Shota
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Takashima, Akihiko
    Masumura, Ryo
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2636 - 2640