Scene Text Detection and Recognition: The Deep Learning Era

被引:218
作者
Long, Shangbang [1 ]
He, Xin [2 ]
Yao, Cong [3 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Machine Learning Dept, Pittsburgh, PA 15213 USA
[2] ByteDance Ltd, Beijing, Peoples R China
[3] MEGVII Inc Face, Beijing, Peoples R China
关键词
Scene text; Optical character recognition; Detection; Recognition; Deep learning; Survey; OBJECT DETECTION; NEURAL-NETWORK; IMAGES; LOCALIZATION; EXTRACTION; VIDEO;
D O I
10.1007/s11263-020-01369-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inevitably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, methodology and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and remaining grand challenges. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected in our Github repository (https://github.com/Jyouhou/SceneTextPapers).
引用
收藏
页码:161 / 184
页数:24
相关论文
共 173 条
[1]   Word Spotting and Recognition with Embedded Attributes [J].
Almazan, Jon ;
Gordo, Albert ;
Fornes, Alicia ;
Valveny, Ernest .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) :2552-2566
[2]  
[Anonymous], 2017, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2019, PROGNOST SYST HEALT
[4]  
[Anonymous], 2017, ARXIV170901727
[5]   Contour Detection and Hierarchical Image Segmentation [J].
Arbelaez, Pablo ;
Maire, Michael ;
Fowlkes, Charless ;
Malik, Jitendra .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) :898-916
[6]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[7]   Character Region Awareness for Text Detection [J].
Baek, Youngmin ;
Lee, Bado ;
Han, Dongyoon ;
Yun, Sangdoo ;
Lee, Hwalsuk .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9357-9366
[8]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[9]   Edit Probability for Scene Text Recognition [J].
Bai, Fan ;
Cheng, Zhanzhan ;
Niu, Yi ;
Pu, Shiliang ;
Zhou, Shuigeng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1508-1516
[10]  
Bao L, 2018, IEEE INT CONF BIG DA, P181, DOI 10.1109/BigData.2018.8622018