Camera-based analysis of text and documents: A survey

被引:234
作者
Liang J. [1 ]
Doermann D. [1 ]
Li H. [2 ]
机构
[1] Language and Media Processing Laboratory, Institute for Advanced Computer Studies, University of Maryland, College Park, MD
[2] Applied Media Analysis, Inc., Ellicott City, MD
来源
International Journal of Document Analysis and Recognition (IJDAR) | 2005年 / 7卷 / 2-3期
关键词
Imaging Device; Document Image; Cellular Phone; Robust Solution; Multiple Frame;
D O I
10.1007/s10032-004-0138-z
中图分类号
学科分类号
摘要
The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development. © Springer-Verlag 2005.
引用
收藏
页码:84 / 104
页数:20
相关论文
共 117 条
  • [51] Li J., Gray R.M., Text and picture segmentation by the distribution analysis of wavelet coefficients, Proc. ICIP, 3, pp. 790-794, (1998)
  • [52] Li H., Kia O., Doermann D., Text enhancement in digital video, Proc. 8th ACM Conference on Information and Knowledge Management, pp. 122-130, (1999)
  • [53] Li H., Doermann D., Text enhancement in digital video using multiple frame integration, Proc. ACM International Multimedia Conference, pp. 19-22, (1999)
  • [54] Li H., Doermann D., A video text detection system based on automated training, Proc. ICPR, pp. 223-226, (2000)
  • [55] Li H., Doermann D., Kia O., Automatic text detection and tracking in digital video, IEEE Trans. Image Process, 9, 1, pp. 147-167, (2000)
  • [56] Lienhart R., Stuber F., Automatic text recognition in digital videos, Proc. SPIE Image and Video Processing IV, 2666, pp. 180-188, (1996)
  • [57] Lienhart R., Effelsberg W., Automatic text segmentation and text recognition for video indexing, ACM Multimedia Syst., 8, pp. 69-81, (2000)
  • [58] Lienhart R., Wernicle A., Localizing and segmenting text in images and videos, IEEE Trans. Circuits Syst. Video Technol., 12, 4, pp. 256-268, (2002)
  • [59] Lopresti D., Zhou J.-Y., Locating and recognizing text in WWW images, Inf. Retrieval, 2, pp. 177-206, (2000)
  • [60] Lucas S.M., Panaretos A., Sosa L., Tang A., Wong S., Young R., ICDAR 2003 robust reading competition, Proc. ICDAR, pp. 682-687, (2003)