THE INTEGRATION OF DOCUMENT IMAGE-PROCESSING AND TEXT RETRIEVAL PRINCIPLES

被引:0
|
作者
VANDERMERWE, N
机构
[1] Xcel, Alkantrant 0005
来源
ELECTRONIC LIBRARY | 1993年 / 11卷 / 4-5期
关键词
D O I
10.1108/eb045245
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an electronic document database that broadens the user's capability to retrieve relevant information more accurately, without going through costly processes to get paper documents into electronic text. The principles of document image processing systems, as well as the problems and shortcomings of most of today's document image processing systems, will be discussed. Then concept retrieval as the latest development in text retrieval will be discussed, with specific reference to the ability of the TOPIC intelligent text retrieval system to allow users to build up a knowledge base of search objects or concepts that can be used at any point in time by all users for the system. This paper will further specifically look at the automatic processing of paper documents by converting the scanned document image pages through to electronic text. The use of optical character recognition technology, the indexing and loading of the documents in a text database, the automatic linking of the documents to the related document images and the retrieval technology available in TOPIC, specifically the TYPO operator that was developed to handle so-called dirty data such as the common misspellings, character transpositions and 'dirty' text received as output from the OCR process, will be discussed. A possible solution to load paper documents quickly and cost-effectively into an electronic document database will be discussed and demonstrated in detail. The advantages and disadvantages of this approach will be discussed with specific reference to an electronic news clipping service application.
引用
收藏
页码:273 / 278
页数:6
相关论文
共 50 条
  • [1] THE INTEGRATION OF DOCUMENT IMAGE-PROCESSING AND TEXT RETRIEVAL PRINCIPLES
    VANDERMERWE, N
    ONLINE & CDROM REVIEW, 1993, 17 (05): : 318 - 319
  • [2] DOCUMENT IMAGE-PROCESSING - THE NEW IMAGE-PROCESSING FRONTIER
    SKELTON, JP
    CAVALLO, AP
    PETERNICK, J
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XII, 1989, 1153 : 442 - 455
  • [3] BASIC PRINCIPLES OF IMAGE-PROCESSING
    WELLS, WA
    RAINER, RO
    MEMOLI, VA
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 1992, 98 (05) : 493 - 501
  • [4] IMAGE-PROCESSING FOR ELECTRONIC DOCUMENT STORAGE
    COOPER, A
    KAHARI, W
    SUCH, R
    IEE PROCEEDINGS-E COMPUTERS AND DIGITAL TECHNIQUES, 1988, 135 (04): : 196 - 201
  • [5] DOCUMENT RESTORATION BY DIGITAL IMAGE-PROCESSING
    SPUCK, WH
    BLACKWELL, RJ
    SOHA, JM
    AMERICAN ARCHIVIST, 1976, 39 (02): : 131 - 155
  • [6] A NEW APPROACH TO TEXT AND IMAGE-PROCESSING
    BLOMBERG, L
    FRENCKNER, K
    KRUSE, B
    LONNEMARK, G
    ROMBERGER, S
    SUNDBLAD, Y
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 1984, 4 (07) : 12 - 22
  • [7] AUTOWAVE PRINCIPLES FOR PARALLEL IMAGE-PROCESSING
    KRINSKY, VI
    BIKTASHEV, VN
    EFIMOV, IR
    PHYSICA D, 1991, 49 (1-2): : 247 - 253
  • [8] THE DIFFERENT SOURCES AND THE INTEGRATION IN IMAGE-PROCESSING
    FUCHS, S
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, 1989, : 15 - 15
  • [9] REOS - A NEW DOCUMENT IMAGE-PROCESSING SYSTEM
    KREUZER, R
    ELECTRONICS & WIRELESS WORLD, 1987, 93 (1616): : 622 - 626
  • [10] ADVANCES IN DIGITAL IMAGE-PROCESSING FOR DOCUMENT REPRODUCTION
    STUCKI, P
    LECTURE NOTES IN COMPUTER SCIENCE, 1984, 163 : 256 - 302