Attention-based Text Recognition in the Wild

被引：0

作者：

Yan, Zhi-Chen ^{[1
]}

Yu, Stephanie A. ^{[2
]}

机构：

[1] Facebook Res, 1 Hacker Way, Menlo Pk, CA 94025 USA

[2] West Isl Sch, Pokfulam, 250 Victoria Rd, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA) | 2020年

关键词：

Attention; Convolution; Deep Learning; LSTM; Text Recognition;

D O I：

10.5220/0009970200420049

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.

引用

页码：42 / 49

页数：8

共 29 条

[1]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]

[2] PhotoOCR: Reading Text in Uncontrolled Conditions [J].

Bissacco, Alessandro ;

Cummins, Mark ;

Netzer, Yuval ;

Neven, Hartmut .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792

[3] Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].

Cheng, Zhanzhan ;

Bai, Fan ;

Xu, Yunlu ;

Zheng, Gang ;

Pu, Shiliang ;

Zhou, Shuigeng .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094

[4]

Graves A., 2006, P 23 INT C MACHINE L, P369

[5] Synthetic Data for Text Localisation in Natural Images [J].

Gupta, Ankush ;

Vedaldi, Andrea ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324

[6]

He P, 2016, AAAI CONF ARTIF INTE, P3501

[7]

Hochreiter Sepp, 1997, Neural Comput., V9, P1735

[8]

Jaderberg M., 2015, NIPS 15 P 28 INT C N, DOI [DOI 10.48550/ARXIV.1506.02025, DOI 10.1038/NBT.3343]

[9]

Jaderberg M, 2014, Arxiv, DOI arXiv:1406.2227

[10] Reading Text in the Wild with Convolutional Neural Networks [J].

Jaderberg, Max ;

Simonyan, Karen ;

Vedaldi, Andrea ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 116 (01) :1-20

← 1 2 3 →