Attention-based Text Recognition in the Wild

被引:0
作者
Yan, Zhi-Chen [1 ]
Yu, Stephanie A. [2 ]
机构
[1] Facebook Res, 1 Hacker Way, Menlo Pk, CA 94025 USA
[2] West Isl Sch, Pokfulam, 250 Victoria Rd, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA) | 2020年
关键词
Attention; Convolution; Deep Learning; LSTM; Text Recognition;
D O I
10.5220/0009970200420049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 29 条
[1]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[2]   PhotoOCR: Reading Text in Uncontrolled Conditions [J].
Bissacco, Alessandro ;
Cummins, Mark ;
Netzer, Yuval ;
Neven, Hartmut .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792
[3]   Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].
Cheng, Zhanzhan ;
Bai, Fan ;
Xu, Yunlu ;
Zheng, Gang ;
Pu, Shiliang ;
Zhou, Shuigeng .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094
[4]  
Graves A., 2006, P 23 INT C MACHINE L, P369
[5]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[6]  
He P, 2016, AAAI CONF ARTIF INTE, P3501
[7]  
Hochreiter Sepp, 1997, Neural Comput., V9, P1735
[8]  
Jaderberg M., 2015, NIPS 15 P 28 INT C N, DOI [DOI 10.48550/ARXIV.1506.02025, DOI 10.1038/NBT.3343]
[9]  
Jaderberg M, 2014, Arxiv, DOI arXiv:1406.2227
[10]   Reading Text in the Wild with Convolutional Neural Networks [J].
Jaderberg, Max ;
Simonyan, Karen ;
Vedaldi, Andrea ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 116 (01) :1-20