OMNIDOCUMENT TECHNOLOGIES

被引：67

作者：

BOKSER, M

机构：

[1] Calera Recognition Systems, Inc., Sunnyvale, CA

来源：

PROCEEDINGS OF THE IEEE | 1992年 / 80卷 / 07期

关键词：

TEXT RECOGNITION; OCR; OMNIFONT; MULTIFONT; POLYFONT; FEATURE EXTRACTION; CLASSIFICATION;

D O I：

10.1109/5.156470

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With recent technical advances, OCR is now a viable technology for a wide range of applications. Calera's OCR engine is omnifont and reasonably robust on individual degraded characters. The weakest link is its handling of characters which are difficult to segment, such as characters which are joined to adjacent characters. The engine is divided into four phases: segmentation, image recognition, ambiguity resolution, and document analysis. The features are zonal and reduce the image to a blurred, gray-level representation. The classifier is data-driven, trained off-line, and model-free. We found that handcrafted features and decision trees tend to be brittle in the presence of noise To satisfy the needs of full-text applications, the system captures the structure of the document so that, when viewed in a word processor or spreadsheet program, the formatting of the OCR'd document reflects the formatting of the original document. To satisfy the needs of the forms market, a proofing and correction tool displays "pop-up" images of uncertain characters.

引用

页码：1066 / 1078

页数：13

共 43 条

[1] The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning
Abu-Mostafa, Yaser S.
[J]. NEURAL COMPUTATION, 1989, 1 (03) : 312 - 317
[2] BAIRD H, 1991, 1ST P INT C DOC AN R, V1, P332
[3] FEATURE IDENTIFICATION FOR HYBRID STRUCTURAL STATISTICAL PATTERN-CLASSIFICATION
BAIRD, HS
[J]. COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1988, 42 (03): : 318 - 333
[4] READING CHESS
BAIRD, HS
THOMPSON, K
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (06) : 552 - 559
[5] BERNARD M, 1982, COMPUT SURV, V14, P593
[6] BRADFORD R, 1991, 1ST P INT C DOC AN R, V2, P516
[7] CROWNER C, 1991, 1ST P INT C DOC AN R, V1, P323
[8] Dengel A., 1988, International Journal of Pattern Recognition and Artificial Intelligence, V2, P641, DOI 10.1142/S0218001488000406
[9] DENGEL A, 1990, JUN P WORKSH SYNT ST, P78
[10] DEVIJVER P, PATTERN RECOGN, P69

← 1 2 3 4 5 →