Information retrieval and machine learning for probabilistic schema matching

被引:22
作者
Nottelmann, Henrik
Straccia, Umberto
机构
[1] CNR, ISTI, I-56124 Pisa, Italy
[2] Univ Duisburg Essen, Dept Informat, D-47048 Duisburg, Germany
关键词
schema matching; data exchange; probability theory; sPLMap;
D O I
10.1016/j.ipm.2006.10.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Schema matching is the problem of finding correspondences (mapping rules, e.g. logical formulae) between heterogeneous schemas e.g. in the data exchange domain, or for distributed IR in federated digital libraries. This paper introduces a probabilistic framework, called sPLMap, for automatically learning schema mapping rules, based on given instances of both schemas. Different techniques, mostly from the IR and machine learning fields, are combined for finding suitable mapping candidates. Our approach gives a probabilistic interpretation of the prediction weights of the candidates, selects the rule set with highest matching probability, and outputs probabilistic rules which are capable to deal with the intrinsic uncertainty of the mapping process. Our approach with different variants has been evaluated on several test sets. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:552 / 576
页数:25
相关论文
共 23 条
[1]  
Bilke A, 2005, PROC INT CONF DATA, P69
[2]  
Calvanese D, 1998, SPRING INT SER ENG C, P229
[3]  
DHAMANKAR R, 2004, P 2004 ACM SIGMOD IN, P383
[4]  
DO HH, 2002, P INT C VER LARG DAT
[5]   Learning to match the schemas of data sources: A multistrategy approach [J].
Doan, A ;
Domingos, P ;
Halevy, A .
MACHINE LEARNING, 2003, 50 (03) :279-301
[6]  
Doan AnHai., 2001, ACM Sigmod Record, V30, P509, DOI DOI 10.1145/375663.375731
[7]  
EMBLEY DW, 2001, WORKSH INF INT WEB, P110
[8]  
Fagin R, 2003, LECT NOTES COMPUT SC, V2572, P207
[9]  
FAGIN R, 2004, P PODS
[10]  
Fuhr N, 2000, J AM SOC INFORM SCI, V51, P95, DOI 10.1002/(SICI)1097-4571(2000)51:2<95::AID-ASI2>3.0.CO