A generic method of cleaning and enhancing handwritten data from business forms

被引:7
作者
Ye X. [1 ,2 ]
Cheriet M. [2 ]
Suen C.Y. [1 ]
机构
[1] Centre for Pattern Recognition and Machine Intelligence, Concordia University, Montréal, QC H3G 1M8, Suite GM606
[2] Laboratory for Imagery, Vision and Artificial Intelligence, École de Technologie Supérieure, University of Quebec, Montréal, QC H3C 1K3, 1100, rue Notre-Dame Ouest
关键词
Form processing; Goal-directed evaluation; Handwriting recognition; Item extraction; Mathematical morphology;
D O I
10.1007/s100320100056
中图分类号
学科分类号
摘要
The automation of business form processing is attracting intensive research interests due to its wide application and its reduction of the heavy workload due to manual processing. Preparing clean and clear images for the recognition engines is often taken for granted as a trivial task that requires little attention. In reality, handwritten data usually touch or cross the preprinted form frames and texts, creating tremendous problems for the recognition engines. In this paper, we contribute answers to two questions: "Why do we need cleaning and enhancement procedures in form processing systems?" and "How can we clean and enhance the hand-filled items with easy implementation and high processing speed?" Here, we propose a generic system including only cleaning and enhancing phases. In the cleaning phase, the system registers a template to the input form by aligning corresponding landmarks. A unified morphological scheme is proposed to remove the form frames and restore the broken handwriting from gray or binary images. When the handwriting is found touching or crossing preprinted texts, morphological operations based on statistical features are used to clean it. In applications where a black-and-white scanning mode is adopted, handwriting may contain broken or hollow strokes due to improper thresholding parameters. Therefore, we have designed a module to enhance the image quality based on morphological operations. Subjective and objective evaluations have been studied to show the effectiveness of the proposed procedures. © 2001 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:84 / 96
页数:12
相关论文
共 39 条
  • [1] Tang Y.Y., Suen C.Y., Yan C.D., Cheriet M., Financial document processing based on staff line and description language, IEEE Trans. Syst. Man Cybern, 25, 5, pp. 738-754, (1995)
  • [2] Suen C.Y., Lam L., Guillevic D., Strathy N.W., Cheriet M., Said J.N., Fan R., Bank check processing system, Int. J. Imaging Syst. Technol, 7, pp. 392-403, (1996)
  • [3] Cesarini F., Gori M., Marinai S., Soda G., INFORMys: A flexible invoice-like form-reader system, IEEE Trans. Pattern Anal. Mach. Intell, 20, 7, pp. 730-745, (1998)
  • [4] Yu B., Jain A.K., A generic system for form dropout, IEEE Trans. Pattern Anal. Mach. Intell, 18, 11, pp. 1127-1132, (1996)
  • [5] Cracknell C., Downton A.C., A colour classification approach to form dropout, Proc. Int. Workshop on Frontiers of Handwriting Recognition, 6, pp. 485-494, (1998)
  • [6] Watanabe T., Luo Q., Sugie N., Layout recognition of multi-kinds of table-form documents, IEEE Trans. Pattern Anal. Mach. Intell, 17, 4, pp. 432-445, (1995)
  • [7] Arai H., Odaka K., Form reading based on background region analysis, Proc. 4th Int. Conf. on Document Analysis and Recognition, pp. 164-169, (1997)
  • [8] Wang D., Srihari S.N., Analysis of form images, Proc. 1st Int. Conf. on Document Analysis and Recognition, pp. 181-191, (1991)
  • [9] Yuan J., Xu L., Suen C.Y., Form items extraction by modelmatching, Proc. 1st Int. Conf. on Document Analysis and Recognition, pp. 210-218, (1991)
  • [10] Tang Y.Y., Cheriet M., Liu J., Said J.N., Suen C.Y., Document analysis and recognition by computers, Handbook of pattern recognition and computer vision, pp. 579-612, (1998)