Scene Text Detection and Recognition: The Deep Learning Era

被引:219
作者
Long, Shangbang [1 ]
He, Xin [2 ]
Yao, Cong [3 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Machine Learning Dept, Pittsburgh, PA 15213 USA
[2] ByteDance Ltd, Beijing, Peoples R China
[3] MEGVII Inc Face, Beijing, Peoples R China
关键词
Scene text; Optical character recognition; Detection; Recognition; Deep learning; Survey; OBJECT DETECTION; NEURAL-NETWORK; IMAGES; LOCALIZATION; EXTRACTION; VIDEO;
D O I
10.1007/s11263-020-01369-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inevitably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, methodology and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and remaining grand challenges. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected in our Github repository (https://github.com/Jyouhou/SceneTextPapers).
引用
收藏
页码:161 / 184
页数:24
相关论文
共 173 条
[11]  
Bartz C., 2017, ARXIV171205404
[12]   PhotoOCR: Reading Text in Uncontrolled Conditions [J].
Bissacco, Alessandro ;
Cummins, Mark ;
Netzer, Yuval ;
Neven, Hartmut .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792
[13]   Rosetta: Large Scale System for Text Detection and Recognition in Images [J].
Borisyuk, Fedor ;
Gordo, Albert ;
Sivakumar, Viswanath .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :71-79
[14]   Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework [J].
Busta, Michal ;
Neumann, Lukas ;
Matas, Jiri .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2223-2231
[15]   FASText: Efficient Unconstrained Scene Text Detector [J].
Busta, Michal ;
Neumann, Lukas ;
Matas, Jiri .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1206-1214
[16]  
Cao J, 2017, IEEE ICC
[17]   Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition [J].
Ch'ng, Chee Kheng ;
Chan, Chee Seng .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :935-942
[18]   Automatic detection and recognition of signs from natural scenes [J].
Chen, XL ;
Yang, J ;
Zhang, J ;
Waibel, A .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2004, 13 (01) :87-99
[19]  
CHENG Z.-Q., 2017, CVPR
[20]   Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].
Cheng, Zhanzhan ;
Bai, Fan ;
Xu, Yunlu ;
Zheng, Gang ;
Pu, Shiliang ;
Zhou, Shuigeng .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094