SaHAN: Scale-aware hierarchical attention network for scene text recognition

被引:7
作者
Zhang, Jiaxin [1 ]
Luo, Canjie [1 ]
Jin, Lianwen [1 ,2 ]
Wang, Tianwei [1 ]
Li, Ziyan [1 ,2 ]
Zhou, Weiying [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510000, Peoples R China
[2] SCUT Zhuhai Inst Modern Ind Innovat, Zhuhai 519000, Peoples R China
关键词
Scene text recognition; Character scale-variation problem; Multi-scale features; Hierarchical attention decoder;
D O I
10.1016/j.patrec.2020.06.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition has become a research hotspot owing to its abundant semantic information and various applications. Recent methods of scene text recognition usually focus on handling shape distortion, attention drift, or background noise, ignoring that text recognition encounters character scale-variation problem. To address this issue, in this paper, we propose a new scale-aware hierarchical attention network (SaHAN) for scene text recognition. Inspired by feature pyramid network, we exploit the inherent pyramidal structure of a deep convolutional network to retain multi-scale features for flexible receptive fields. Then, we construct a hierarchical attention decoder that performs the attention mechanism twice on multi-scale features to collect the most fine-grained information for prediction. The SaHAN is trained in a weak supervision way, requiring only images and corresponding text labels. Extensive experiments on seven benchmarks reveal that SaHAN achieves state-of-the-art performance. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:205 / 211
页数:7
相关论文
共 44 条
  • [1] [Anonymous], 2018, AAAI
  • [2] [Anonymous], 1997, Neural Computation
  • [3] [Anonymous], 2018, EUROPEAN C COMPUTER
  • [4] [Anonymous], 2018, IEEE T PATTERN ANAL
  • [5] [Anonymous], 2018, P AAAI C ART INT
  • [6] Bahdanau D., 2015, P 3 INT C LEARN REPR
  • [7] Edit Probability for Scene Text Recognition
    Bai, Fan
    Cheng, Zhanzhan
    Niu, Yi
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1508 - 1516
  • [8] AON: Towards Arbitrarily-Oriented Text Recognition
    Cheng, Zhanzhan
    Xu, Yangliu
    Bai, Fan
    Niu, Yi
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5571 - 5579
  • [9] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [10] Cho Kyunghyun, 2014, ASS COMPUT LINGUIST