SaHAN: Scale-aware hierarchical attention network for scene text recognition

被引：7

作者：

Zhang, Jiaxin ^{[1
]}

Luo, Canjie ^{[1
]}

Jin, Lianwen ^{[1
,2
]}

Wang, Tianwei ^{[1
]}

Li, Ziyan ^{[1
,2
]}

Zhou, Weiying ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510000, Peoples R China

[2] SCUT Zhuhai Inst Modern Ind Innovat, Zhuhai 519000, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2020年 / 136卷

关键词：

Scene text recognition; Character scale-variation problem; Multi-scale features; Hierarchical attention decoder;

D O I：

10.1016/j.patrec.2020.06.009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text recognition has become a research hotspot owing to its abundant semantic information and various applications. Recent methods of scene text recognition usually focus on handling shape distortion, attention drift, or background noise, ignoring that text recognition encounters character scale-variation problem. To address this issue, in this paper, we propose a new scale-aware hierarchical attention network (SaHAN) for scene text recognition. Inspired by feature pyramid network, we exploit the inherent pyramidal structure of a deep convolutional network to retain multi-scale features for flexible receptive fields. Then, we construct a hierarchical attention decoder that performs the attention mechanism twice on multi-scale features to collect the most fine-grained information for prediction. The SaHAN is trained in a weak supervision way, requiring only images and corresponding text labels. Extensive experiments on seven benchmarks reveal that SaHAN achieves state-of-the-art performance. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：205 / 211

页数：7

共 44 条

[11] Graves A., 2006, ICML P 23 INT C MACH, DOI DOI 10.1145/1143844.1143891
[12] Thigh fracture detection using deep learning method based on new dilated convolutional feature pyramid network
Guan, Bin
Yao, Jinkun
Zhang, Guoshan
Wang, Xinbo
[J]. PATTERN RECOGNITION LETTERS, 2019, 125 : 521 - 526
[13] Synthetic Data for Text Localisation in Natural Images
Gupta, Ankush
Vedaldi, Andrea
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324
[14] He K, 2015, ABS151203385 CORR, V77, P10437
[15] Jaderberg M., 2014, P ADV NEURAL INFORM
[16] Reading Text in the Wild with Convolutional Neural Networks
Jaderberg, Max
Simonyan, Karen
Vedaldi, Andrea
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 116 (01) : 1 - 20
[17] Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942
[18] ICDAR 2013 Robust Reading Competition
Karatzas, Dimosthenis
Shafait, Faisal
Uchida, Seiichi
Iwamura, Masakazu
Gomez i Bigorda, Lluis
Robles Mestre, Sergi
Mas, Joan
Fernandez Mota, David
Almazan Almazan, Jon
Pere de las Heras, Lluis
[J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1484 - 1493
[19] Li H, 2019, AAAI CONF ARTIF INTE, P8610
[20] Liao MH, 2019, AAAI CONF ARTIF INTE, P8714

← 1 2 3 4 5 →