Review of Natural Scene Text Detection and Recognition Based on Deep Learning

被引:0
作者
Wang J.-X. [1 ,2 ]
Wang Z.-Y. [1 ,2 ]
Tian X. [1 ,2 ]
机构
[1] School of Information Science and Technology, Beijing Forestry University, Beijing
[2] Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing Forestry University, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 05期
关键词
Deep learning; End-to-end; Natural scene; Text detection; Text recognition;
D O I
10.13328/j.cnki.jos.005988
中图分类号
学科分类号
摘要
Natural scene text detection and recognition is important for obtaining information from scenes, and it can be improved by the help of deep learning. In this study, the deep learning-based methods of text detection and recognition in natural scenes are classified, analyzed, and summarized. Firstly, the research background of natural scene text detection and recognition and the main technical research routes are discussed. Then, according to different processing phases of natural scene text information processing, the text detection model, text recognition model and end-to-end text recognition model are further introduced, in which the basic ideas, advantages, and disadvantages of each method are also discussed and analyzed. Furthermore, the common standard datasets and performance evaluation indicators and functions are enumerated, and the experimental results of different models are compared and analyzed. Finally, the challenge and development trends of deep learning-based text detection and recognition in natural scenes are summarized. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1465 / 1496
页数:31
相关论文
共 105 条
[1]  
Li YX, Ma JW., The developments and challenges of text detection algorithms, Journal of Signal Processing, 33, 4, pp. 558-571, (2017)
[2]  
Wang RM, Sang N, Ding D, Chen J, Ye QX, Gao CX, Liu L., Text detection in natural scene image: A survey, Acta Automatica Sinica, 44, 12, pp. 2113-2141, (2018)
[3]  
Neumann L, Matas J., A method for text localization and recognition in real-world images, Proc. of the Asian Conf. on Computer Vision, pp. 770-783, (2010)
[4]  
Wang K, Babenko B, Belongie SJ., End-to-end scene text recognition, Proc. of the Int’l Conf. on Computer Vision, pp. 1457-1464, (2011)
[5]  
Hinton GE, Salakhutdinov R., Reducing the dimensionality of data with neural networks, Science, 313, 5786, pp. 504-507, (2006)
[6]  
Hochreiter S, Schmidhuber J., Long short-term memory, Neural Computation, 9, 8, pp. 1735-1780, (1997)
[7]  
Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation, (2014)
[8]  
Epshtein B, Ofek E, Wexler Y., Detecting text in natural scenes with stroke width transform, Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 2963-2970, (2010)
[9]  
Matas J, Chum O, Urban M, Pajdla T., Robust wide-baseline stereo from maximally stable extremal regions, Image Vision Computing, 22, 10, pp. 761-767, (2004)
[10]  
Wang K, Belongie SJ., Word spotting in the wild, Proc. of the European Conf. on Computer Vision, pp. 591-604, (2010)