Camera-based analysis of text and documents: A survey

被引:234
作者
Liang J. [1 ]
Doermann D. [1 ]
Li H. [2 ]
机构
[1] Language and Media Processing Laboratory, Institute for Advanced Computer Studies, University of Maryland, College Park, MD
[2] Applied Media Analysis, Inc., Ellicott City, MD
来源
International Journal of Document Analysis and Recognition (IJDAR) | 2005年 / 7卷 / 2-3期
关键词
Imaging Device; Document Image; Cellular Phone; Robust Solution; Multiple Frame;
D O I
10.1007/s10032-004-0138-z
中图分类号
学科分类号
摘要
The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development. © Springer-Verlag 2005.
引用
收藏
页码:84 / 104
页数:20
相关论文
共 117 条
  • [11] Clark P., Mirmehdi M., Location and recovery of text on oriented surfaces, Proc. SPIE Document Recognition and Retrieval VII, pp. 267-277, (2000)
  • [12] Clark P., Mirmehdi M., Finding text regions using localised measures, Proc. 11th BMVC, pp. 675-684, (2000)
  • [13] Clark P., Mirmehdi M., Estimating the orientation and recovery of text planes in a single image, Proc. 12th BMVC, pp. 421-430, (2001)
  • [14] Clark P., Mirmehdi M., On the recovery of oriented documents from single images, Proc. Advanced Concepts for Intelligent Vision Systems, pp. 190-197, (2002)
  • [15] Clark P., Mirmehdi M., Recognizing text in real scenes, Int. J. Doc. Anal. Recog., 4, 4, pp. 243-257, (2002)
  • [16] Comelli P., Ferragina P., Granieri M.N., Stabile F., Optical recognition of motor vehicle license plates, IEEE Trans. Vehicular Technol., 44, 4, pp. 790-799, (1995)
  • [17] Crandall D., Antani S., Kasturi R., Extraction of special effects caption text events from digital video, Int. J. Doc. Anal. Recog., 5, 2-3, pp. 138-157, (2003)
  • [18] Dance C.R., Perspective estimation for document images, Proc. SPIE Document Reconition and Retrieval IX, pp. 244-254, (2002)
  • [19] Doermann D., The indexing and retrieval of document images: A survey, Comput. Vis. Image Understand, 70, 3, pp. 287-298, (1998)
  • [20] Doermann D., Mihalcik D., Tools and techniques for video performance evaluation, Proc. ICPR, pp. 167-170, (2000)