Document image analysis: A primer

被引:63
作者
Kasturi, R [1 ]
O'Gorman, L [1 ]
Govindaraju, V [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
来源
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES | 2002年 / 27卷 / 1期
关键词
OCR; feature analysis; document processing; graphics recognition; character recognition; layout analysis;
D O I
10.1007/BF02703309
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Document image analysis refers to algorithms and techniques that are applied to images of documents to obtain a computer-readable description from pixel data. A well-known document image analysis product is the Optical Character Recognition (OCR) software that recognizes characters in a scanned document. OCR makes it possible for the user to edit or search the document's contents. In this paper we briefly describe various components of a document analysis system. Many of these basic building blocks are found in most document analysis systems, irrespective of the particular domain or language to which they are applied. We hope that this paper will help the reader by providing the background necessary to understand the detailed descriptions of specific techniques presented in other papers in this issue.
引用
收藏
页码:3 / 22
页数:20
相关论文
共 52 条
  • [1] AUTOMATED ENTRY SYSTEM FOR PRINTED DOCUMENTS
    AKIYAMA, T
    HAGITA, N
    [J]. PATTERN RECOGNITION, 1990, 23 (11) : 1141 - 1154
  • [2] EUCLIDEAN SKELETON VIA CENTER-OF-MAXIMAL-DISC EXTRACTION
    ARCELLI, C
    DIBAJA, GS
    [J]. IMAGE AND VISION COMPUTING, 1993, 11 (03) : 163 - 173
  • [3] A WIDTH-INDEPENDENT FAST THINNING ALGORITHM
    ARCELLI, C
    DIBAJA, GS
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1985, 7 (04) : 463 - 474
  • [4] BAIRD HS, 1987, P C SOC PHOT SCI ENG, P14
  • [5] BHARATI A, 1998, COMPUTATIONAL LINGUI
  • [6] FROM PAPER TO OFFICE DOCUMENT STANDARD REPRESENTATION
    DENGEL, A
    BLEISINGER, R
    HOCH, R
    FEIN, F
    HONES, F
    [J]. COMPUTER, 1992, 25 (07) : 63 - 67
  • [7] DIBAJA GS, 1994, J VIS COMMUN IMAGE R, V5, P107
  • [8] A ROBUST ALGORITHM FOR TEXT STRING SEPARATION FROM MIXED TEXT GRAPHICS IMAGES
    FLETCHER, LA
    KASTURI, R
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (06) : 910 - 918
  • [9] Freeman H., 1974, Computing Surveys, V6, P57, DOI 10.1145/356625.356627
  • [10] FREEMAN H, 1977, IEEE T COMPUT, V26, P297, DOI 10.1109/TC.1977.1674825