Scene Text Recognition with Permuted Autoregressive Sequence Models

被引：78

作者：

Bautista, Darwin ^{[1
]}

Atienza, Rowel ^{[1
]}

机构：

[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷

关键词：

Scene text recognition; Permutation language modeling; Autoregressive modeling; Cross-modal attention; Transformer;

D O I：

10.1007/978-3-031-19815-1_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.

引用

页码：178 / 196

页数：19

共 50 条

[31] HIERARCHICAL REFINED ATTENTION FOR SCENE TEXT RECOGNITION
Zhang, Min
Ma, Meng
Wang, Ping
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4175 - 4179
[32] Adaptive Adversarial Attack on Scene Text Recognition
Yuan, Xiaoyong
He, Pan
Li, Xiaolin
Wu, Dapeng
IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2020, : 358 - 363
[33] Triggered Attention Model for Scene Text Recognition
Zhang, Churong
Ming, Yue
ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
[34] Scene Text Recognition with Cascade Attention Network
Zhang, Min
Ma, Meng
Wang, Ping
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 385 - 393
[35] A Feature Learning Method for Scene Text Recognition
Ho Vu Duong
Quoc Ngoc Ly
2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 176 - 180
[36] Representative Batch Normalization for Scene Text Recognition
Sun, Yajie
Cao, Xiaoling
Sun, Yingying
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (07): : 2390 - 2406
[37] Scene Text Recognition with Multi-decoders
Wang, Yao
Ha, Jong-Eun
2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021,
[38] SCENE TEXT RECOGNITION WITH TEMPORAL CONVOLUTIONAL ENCODER
Du, Xiangcheng
Ma, Tianlong
Zheng, Yingbin
Ye, Hao
Wu, Xingjiao
He, Liang
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2383 - 2387
[39] An extended attention mechanism for scene text recognition
Xiao, Zheng
Nie, Zhenyu
Song, Chao
Chronopoulos, Anthony Theodore
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
[40] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
Orihashi, Shota
Yamazaki, Yoshihiro
Uchida, Mihiro
Takashima, Akihiko
Masumura, Ryo
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2636 - 2640

← 1 2 3 4 5 →