Scene Text Recognition with Permuted Autoregressive Sequence Models

被引：78

作者：

Bautista, Darwin ^{[1
]}

Atienza, Rowel ^{[1
]}

机构：

[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷

关键词：

Scene text recognition; Permutation language modeling; Autoregressive modeling; Cross-modal attention; Transformer;

D O I：

10.1007/978-3-031-19815-1_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text, which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.

引用

页码：178 / 196

页数：19

共 50 条

[41] Transformer-based end-to-end scene text recognition
Zhu, Xinghao
Zhang, Zhi
PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
[42] STR Transformer: A Cross-domain Transformer for Scene Text Recognition
Xing Wu
Bin Tang
Ming Zhao
Jianjia Wang
Yike Guo
Applied Intelligence, 2023, 53 : 3444 - 3458
[43] STR Transformer: A Cross-domain Transformer for Scene Text Recognition
Wu, Xing
Tang, Bin
Zhao, Ming
Wang, Jianjia
Guo, Yike
APPLIED INTELLIGENCE, 2023, 53 (03) : 3444 - 3458
[44] Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition
Xue, Chuhui
Huang, Jiaxing
Zhang, Wenqing
Lu, Shijian
Wang, Changhu
Bai, Song
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12908 - 12921
[45] An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
Shi, Baoguang
Bai, Xiang
Yao, Cong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) : 2298 - 2304
[46] CATNet: Scene Text Recognition Guided by Concatenating Augmented Text Features
Zhang, Ziyin
Pan, Lemeng
Du, Lin
Li, Qingrui
Lu, Ning
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 350 - 365
[47] Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition
Tian, Zhengkun
Yi, Jiangyan
Tao, Jianhua
Zhang, Shuai
Wen, Zhengqi
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 762 - 766
[48] Background-Insensitive Scene Text Recognition with Text Semantic Segmentation
Zhao, Liang
Wu, Zhenyao
Wu, Xinyi
Wilsbacher, Greg
Wang, Song
COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 163 - 182
[49] Reading scene text with fully convolutional sequence modeling
Gao, Yunze
Chen, Yingying
Wang, Jinqiao
Tang, Ming
Lu, Hanqing
NEUROCOMPUTING, 2019, 339 : 161 - 170
[50] Multi-granularity Prediction for Scene Text Recognition
Wang, Peng
Da, Cheng
Yao, Cong
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 339 - 355

← 1 2 3 4 5 →