Automated template-based metadata extraction architecture

被引:0
作者
Flynn, Paul [1 ]
Zhou, Li [1 ]
Maly, Kurt [1 ]
Zeil, Steven [1 ]
Zubair, Mohammad [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
来源
ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS | 2007年 / 4822卷
关键词
metadata; heterogeneous collections; automation;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, but is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. Our automated process enables many more documents to be available online than would otherwise have been possible due to time and cost constraints. We describe our architecture and implementation and illustrate the effectiveness of the tool-set by providing experimental results on two major collections DTIC (Defense Technical Information Center) and NASA (National Aeronautics and Space Administration).
引用
收藏
页码:327 / 336
页数:10
相关论文
共 18 条
[1]  
BERGMARK D, 2000, 20001821 CSTR
[2]  
CRYSTAL A, 2003, DCMI 2003 WORKSH SEA
[3]  
*DEF TECHN INF CTR, 2007, PUBL SCI TECHN INF N
[4]  
GREENBURG J, 2005, FINAL REPORT AUTOMAT
[5]  
Han H, 2003, ACM-IEEE J CONF DIG, P37
[6]  
HAN H, 2006, LNCS, V3897, P1049
[7]  
Klink Stefan., 2000, P INT WORKSH DOC AN, P99
[8]  
*LIB C, BIBL CONTR WEB RES L
[9]  
MALY K, 2007, IN PRESS 1 INT WORKS
[10]  
MALY K, 2007, EXPLOITING DYNAMIC V