Scene Text Recognition with Permuted Autoregressive Sequence Models

被引:78
|
作者
Bautista, Darwin [1 ]
Atienza, Rowel [1 ]
机构
[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines
来源
COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷
关键词
Scene text recognition; Permutation language modeling; Autoregressive modeling; Cross-modal attention; Transformer;
D O I
10.1007/978-3-031-19815-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.
引用
收藏
页码:178 / 196
页数:19
相关论文
共 50 条
  • [21] Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition
    Wang, Zixiao
    Xie, Hongtao
    Wang, Yuxin
    Xu, Jianjun
    Zhang, Boqiang
    Zhang, Yongdong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 509 - 518
  • [22] Text Font Correction and Alignment Method for Scene Text Recognition
    Ding, Liuxu
    Liu, Yuefeng
    Zhao, Qiyan
    Liu, Yunong
    SENSORS, 2024, 24 (24)
  • [23] Text-Level Contrastive Learning for Scene Text Recognition
    Zhuang, Junbin
    Ren, Yixuan
    Li, Xia
    Liang, Zhanpeng
    2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 231 - 236
  • [24] End-to-end scene text recognition using tree-structured models
    Shi, Cunzhao
    Wang, Chunheng
    Xiao, Baihua
    Gao, Song
    Hu, Jinlong
    PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
  • [25] Video Scene Text Frames Categorization for Text Detection and Recognition
    Qin, Longfei
    Shivakumara, Palaiahnakote
    Lu, Tong
    Pal, Umapada
    Tan, Chew Lim
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3886 - 3891
  • [26] SCENE TEXT RECOGNITION IN MULTIPLE FRAMES BASED ON TEXT TRACKING
    Rong, Xuejian
    Yi, Chucai
    Yang, Xiaodong
    Tian, Yingli
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
  • [27] Towards Scene Text Recognition with Genetic Programming
    Barlow, Brendan
    Song, Andy
    2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 1310 - 1317
  • [28] Instruction-Guided Scene Text Recognition
    Du, Yongkun
    Chen, Zhineng
    Su, Yuchen
    Jia, Caiyan
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2723 - 2738
  • [29] DIFFUSIONSTR: DIFFUSION MODEL FOR SCENE TEXT RECOGNITION
    Fujitake, Masato
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1585 - 1589
  • [30] Dual Relation Network for Scene Text Recognition
    Li, Ming
    Fu, Bin
    Chen, Han
    He, Junjun
    Qiao, Yu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4094 - 4107