SaHAN: Scale-aware hierarchical attention network for scene text recognition

被引:7
作者
Zhang, Jiaxin [1 ]
Luo, Canjie [1 ]
Jin, Lianwen [1 ,2 ]
Wang, Tianwei [1 ]
Li, Ziyan [1 ,2 ]
Zhou, Weiying [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510000, Peoples R China
[2] SCUT Zhuhai Inst Modern Ind Innovat, Zhuhai 519000, Peoples R China
关键词
Scene text recognition; Character scale-variation problem; Multi-scale features; Hierarchical attention decoder;
D O I
10.1016/j.patrec.2020.06.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition has become a research hotspot owing to its abundant semantic information and various applications. Recent methods of scene text recognition usually focus on handling shape distortion, attention drift, or background noise, ignoring that text recognition encounters character scale-variation problem. To address this issue, in this paper, we propose a new scale-aware hierarchical attention network (SaHAN) for scene text recognition. Inspired by feature pyramid network, we exploit the inherent pyramidal structure of a deep convolutional network to retain multi-scale features for flexible receptive fields. Then, we construct a hierarchical attention decoder that performs the attention mechanism twice on multi-scale features to collect the most fine-grained information for prediction. The SaHAN is trained in a weak supervision way, requiring only images and corresponding text labels. Extensive experiments on seven benchmarks reveal that SaHAN achieves state-of-the-art performance. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:205 / 211
页数:7
相关论文
共 44 条
  • [11] Graves A., 2006, ICML P 23 INT C MACH, DOI DOI 10.1145/1143844.1143891
  • [12] Thigh fracture detection using deep learning method based on new dilated convolutional feature pyramid network
    Guan, Bin
    Yao, Jinkun
    Zhang, Guoshan
    Wang, Xinbo
    [J]. PATTERN RECOGNITION LETTERS, 2019, 125 : 521 - 526
  • [13] Synthetic Data for Text Localisation in Natural Images
    Gupta, Ankush
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324
  • [14] He K, 2015, ABS151203385 CORR, V77, P10437
  • [15] Jaderberg M., 2014, P ADV NEURAL INFORM
  • [16] Reading Text in the Wild with Convolutional Neural Networks
    Jaderberg, Max
    Simonyan, Karen
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 116 (01) : 1 - 20
  • [17] Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942
  • [18] ICDAR 2013 Robust Reading Competition
    Karatzas, Dimosthenis
    Shafait, Faisal
    Uchida, Seiichi
    Iwamura, Masakazu
    Gomez i Bigorda, Lluis
    Robles Mestre, Sergi
    Mas, Joan
    Fernandez Mota, David
    Almazan Almazan, Jon
    Pere de las Heras, Lluis
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1484 - 1493
  • [19] Li H, 2019, AAAI CONF ARTIF INTE, P8610
  • [20] Liao MH, 2019, AAAI CONF ARTIF INTE, P8714