Single Shot Text Detector with Regional Attention

被引:219
作者
He, Pan [1 ]
Huang, Weilin [2 ,3 ]
He, Tong [3 ]
Zhu, Qile [1 ]
Qiao, Yu [3 ]
Li, Xiaolin [1 ]
机构
[1] Univ Florida, Natl Sci Fdn, Ctr Big Learning, Gainesville, FL 32611 USA
[2] Univ Oxford, Dept Engn Sci, Oxford, England
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Prov Key Lab Comp Vis & Virtual Real Te, Shenzhen, Peoples R China
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2017年
基金
美国国家科学基金会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
D O I
10.1109/ICCV.2017.331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely small sizes. This results in a single model that essentially works in a coarse-to-fine manner. It departs from recent FCN-based text detectors which cascade multiple FCN models to achieve an accurate prediction. Furthermore, we develop a hierarchical inception module which efficiently aggregates multi-scale inception features. This enhances local details, and also encodes strong context information, allowing the detector to work reliably on multi-scale and multi-orientation text with single-scale images. Our text detector achieves an F-measure of 77% on the ICDAR 2015 benchmark, advancing the state-of-the-art results in [18, 28]. Demo is available at: http://sstd.whuang.org/.
引用
收藏
页码:3066 / 3074
页数:9
相关论文
共 38 条
[1]  
[Anonymous], 2017, CVPR
[2]  
[Anonymous], 2014, ECCV
[3]  
[Anonymous], 2015, CVPR
[4]  
[Anonymous], 2014, ACM MM
[5]  
[Anonymous], 2012, CVPR
[6]  
[Anonymous], 2016, ARXIV PREPRINT ARXIV
[7]  
[Anonymous], CVPR
[8]  
[Anonymous], AAAI
[9]  
[Anonymous], 2015, ICDAR
[10]  
[Anonymous], 2010, CVPR