A Hybrid Case-based and Rule-based for Metadata Extraction on Heterogeneous Thai Documents

被引:0
作者
Khankasikam, Krisda [1 ]
机构
[1] Naresuan Univ Phayao, Sch Informat Commun & Technol, Muang Phayao, Thailand
来源
2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1 | 2010年
关键词
Case-based Reasoning; Rule-based Reasoning; Metadata Extraction; Thai Documents;
D O I
10.1109/ICCAE.2010.5451943
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper reports an experience of human-assisted process to extract metadata from Thai documents. Nowadays, a number of Thai archives are placed online for sharing increasingly because the Internet infrastructure for global data access is fully functional. However, a large number of Thai archives have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. The manually extracting of these metadata elements is highly labor-intensive, costly and time-consuming for a large document then automated is a key idea to solve the problem but the most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. This paper is proposed a combined cased-based and rule-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups by using rule-based approach so that each document group contains similar documents only. Next, for each document group the system will be applied case-base reasoning cycle that contains a process to extract metadata from documents in the group. The system performs the level of precision at 62.31% - 90.78% depending on the characteristic of the data set.
引用
收藏
页码:312 / 317
页数:6
相关论文
共 15 条
[1]  
[Anonymous], AICOM
[2]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[3]  
Crystal A., 2003, P GLOB CORP CIRC DCM
[4]  
Doane M., 2003, P GLOB CORP CIRC DCM
[5]  
Flynn P, 2007, LECT NOTES COMPUT SC, V4822, P327
[6]  
Gayer G, 2007, BE J THEOR ECON, V7
[7]  
Jaruskulchai C., 2003, P 6 INT WORKSH INF R, P9
[8]  
Kang S. H., 2002, P AUSTR C INF SYST
[9]  
Kawtrakul Asanee, 2005, P INT ADV DIG LIB C
[10]   A Unified Framework for Thai Metadata Extraction Using Case-based Reasoning [J].
Khankasikam, Krisda ;
Chakpitak, Nopasit .
2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, :210-214