Scene Text Recognition with Permuted Autoregressive Sequence Models

被引：78

作者：

Bautista, Darwin ^{[1
]}

Atienza, Rowel ^{[1
]}

机构：

[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷

关键词：

Scene text recognition; Permutation language modeling; Autoregressive modeling; Cross-modal attention; Transformer;

D O I：

10.1007/978-3-031-19815-1_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.

引用

页码：178 / 196

页数：19

共 50 条

[21] Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition
Wang, Zixiao
Xie, Hongtao
Wang, Yuxin
Xu, Jianjun
Zhang, Boqiang
Zhang, Yongdong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 509 - 518
[22] Text Font Correction and Alignment Method for Scene Text Recognition
Ding, Liuxu
Liu, Yuefeng
Zhao, Qiyan
Liu, Yunong
SENSORS, 2024, 24 (24)
[23] Text-Level Contrastive Learning for Scene Text Recognition
Zhuang, Junbin
Ren, Yixuan
Li, Xia
Liang, Zhanpeng
2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 231 - 236
[24] End-to-end scene text recognition using tree-structured models
Shi, Cunzhao
Wang, Chunheng
Xiao, Baihua
Gao, Song
Hu, Jinlong
PATTERN RECOGNITION, 2014, 47 (09) : 2853 - 2866
[25] Video Scene Text Frames Categorization for Text Detection and Recognition
Qin, Longfei
Shivakumara, Palaiahnakote
Lu, Tong
Pal, Umapada
Tan, Chew Lim
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3886 - 3891
[26] SCENE TEXT RECOGNITION IN MULTIPLE FRAMES BASED ON TEXT TRACKING
Rong, Xuejian
Yi, Chucai
Yang, Xiaodong
Tian, Yingli
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
[27] Towards Scene Text Recognition with Genetic Programming
Barlow, Brendan
Song, Andy
2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 1310 - 1317
[28] Instruction-Guided Scene Text Recognition
Du, Yongkun
Chen, Zhineng
Su, Yuchen
Jia, Caiyan
Jiang, Yu-Gang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2723 - 2738
[29] DIFFUSIONSTR: DIFFUSION MODEL FOR SCENE TEXT RECOGNITION
Fujitake, Masato
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1585 - 1589
[30] Dual Relation Network for Scene Text Recognition
Li, Ming
Fu, Bin
Chen, Han
He, Junjun
Qiao, Yu
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4094 - 4107

← 1 2 3 4 5 →