OCRSpell: An interactive spelling correction system for OCR errors in text

被引:32
|
作者
Taghva K. [1 ]
Stofsky E. [1 ]
机构
[1] Information Science Research Institute, University of Nevada, Las Vegas, Las Vegas
关键词
Error correction; Information retrieval; OCR-Spell checkers; Scanning;
D O I
10.1007/PL00013558
中图分类号
学科分类号
摘要
In this paper, we describe a spelling correction system designed specifically for OCR-generated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and n-gram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well. © 2001 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:125 / 137
页数:12
相关论文
共 50 条
  • [41] SPELLING ERROR DETECTION-CORRECTION FOR LARGE TEXT FILES
    POLLOCK, JJ
    ZAMORA, A
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1981, 181 (MAR): : 31 - CINF
  • [42] ICDAR2017 Competition on Post-OCR Text Correction
    Chiron, Guillaume
    Doucet, Antoine
    Coustaty, Mickael
    Moreux, Jean-Philippe
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1423 - 1428
  • [43] SPELLING CORRECTION FOR AN INTELLIGENT TUTORING SYSTEM
    LEE, YH
    EVENS, M
    MICHAEL, JA
    ROVICK, AA
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 507 : 77 - 83
  • [44] AUTOMATIC ERROR-CORRECTION AND QUERY EVALUATION OF OCR GENERATED TEXT
    TAGHVA, K
    BORSACK, J
    CONDIT, A
    ONLINE & CDROM REVIEW, 1994, 18 (01): : 47 - 47
  • [45] Efficient Solutions for OCR Text Remote Correction in Content Conversion Systems
    Boiangiu, Costin-Anton
    Topliceanu, Alexandru
    Bucur, Ion
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2013, 15 (01): : 22 - 32
  • [46] A high accuracy OCR system for printed Telugu text
    Lakshmi, CV
    Patvardhan, C
    IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 725 - 729
  • [47] DATA BASE INPUT AND TEXT HANDLING IN AN OCR SYSTEM
    REITZ, G
    IEEE COMPUTER GROUP NEWS, 1970, 3 (03): : 17 - &
  • [48] Character confusion versus focus word-based correction of spelling and OCR variants in corpora
    Martin W. C. Reynaert
    International Journal on Document Analysis and Recognition (IJDAR), 2011, 14 : 173 - 187
  • [49] Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
    Najam, Rayyan
    Faizullah, Safiullah
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [50] Character confusion versus focus word-based correction of spelling and OCR variants in corpora
    Reynaert, Martin W. C.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2011, 14 (02) : 173 - 187