Biblio: automatic meta-data extraction

被引:3
作者
Staelin, Carl [1 ]
Elad, Michael
Greig, Darryl
Shmueli, Oded
Vans, Marie
机构
[1] Hewlett Packard Labs, IL-32000 Technion, Haifa, Israel
[2] Hewlett Packard Labs, Bristol BS34 8QZ, Stoke Gifford, England
[3] Technion Israel Inst Technol, Dept Comp Sci, IL-32000 Haifa, Israel
关键词
document recognition; document understanding; neural networks; support vector machines; machine learning;
D O I
10.1007/s10032-006-0032-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biblio is an adaptive system that automatically extracts meta-data from semi-structured and structured scanned documents. Instead of using hand-coded templates or other methods manually customized for each given document format, it uses example-based machine learning to adapt to customer-defined document and meta-data types. We provide results from experiments on the recognition of document information in two document corpuses: a set of scanned journal articles and a set of scanned legal documents. The first set is semi-structured, as the different journals use a variety of flexible layouts. The second set is largely free-form text based on poor quality scans of FAX-quality legal documents. We demonstrate accuracy on the semi-structured document set roughly comparable to hand-coded systems, and much worse performance on the legal documents.
引用
收藏
页码:113 / 126
页数:14
相关论文
共 38 条
[1]  
[Anonymous], RECORDS MANAGEMENT Q
[2]  
[Anonymous], 2000, STAT ENG INFORM SCI
[3]  
[Anonymous], 1990, SUPPORT VECTOR LEARN
[4]  
BAUMANN S, 1995, RR9503 DFKI
[5]  
Bishop CM., 1995, Neural networks for pattern recognition
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]  
Casey R., 1992, Machine Vision and Applications, V5, P143, DOI 10.1007/BF02626994
[8]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction
[9]  
DENGEL A, 1994, P 1 INT WORKSH DOC A, P253
[10]   Are multilayer perceptrons adequate for pattern recognition and verification? [J].
Gori, M ;
Scarselli, F .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1121-1132