Attention-based Text Recognition in the Wild

被引:0
作者
Yan, Zhi-Chen [1 ]
Yu, Stephanie A. [2 ]
机构
[1] Facebook Res, 1 Hacker Way, Menlo Pk, CA 94025 USA
[2] West Isl Sch, Pokfulam, 250 Victoria Rd, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA) | 2020年
关键词
Attention; Convolution; Deep Learning; LSTM; Text Recognition;
D O I
10.5220/0009970200420049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 29 条
  • [1] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
  • [2] PhotoOCR: Reading Text in Uncontrolled Conditions
    Bissacco, Alessandro
    Cummins, Mark
    Netzer, Yuval
    Neven, Hartmut
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 785 - 792
  • [3] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [4] Graves A., 2006, P 23 INT C MACHINE L, P369
  • [5] Synthetic Data for Text Localisation in Natural Images
    Gupta, Ankush
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324
  • [6] He P, 2016, AAAI CONF ARTIF INTE, P3501
  • [7] Hochreiter Sepp, 1997, Neural Comput., V9, P1735
  • [8] Jaderberg M., 2015, NIPS 15 P 28 INT C N, DOI [DOI 10.48550/ARXIV.1506.02025, DOI 10.1038/NBT.3343]
  • [9] Jaderberg M, 2014, Arxiv, DOI arXiv:1406.2227
  • [10] Reading Text in the Wild with Convolutional Neural Networks
    Jaderberg, Max
    Simonyan, Karen
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 116 (01) : 1 - 20