Automatic generation of probabilistic relationships for improving schema matching

被引:19
作者
Po, Laura [1 ]
Sorrentino, Serena [1 ]
机构
[1] Univ Modena & Reggio Emilia, Dept 2, I-41125 Modena, Italy
关键词
Semantic relationships; Probabilistic schema mapping; Word sense disambiguation; Schema normalization;
D O I
10.1016/j.is.2010.09.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Schema matching is the problem of finding relationships among concepts across data sources that are heterogeneous in format and in structure. Starting from the "hidden meaning" associated with schema labels (i.e. class/attribute names), it is possible to discover lexical relationships among the elements of different schemata. In this work, we propose an automatic method aimed at discovering probabilistic lexical relationships in the environment of data integration "on the fly". Our method is based on a probabilistic lexical annotation technique, which automatically associates one or more meanings with schema elements w.r.t, a thesaurus/lexical resource. However, the accuracy of automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and abbreviations. We address this problem by including a method to perform schema label normalization which increases the number of comparable labels. From the annotated schemata, we derive the probabilistic lexical relationships to be collected in the Probabilistic Common Thesaurus. The method is applied within the MOMIS data integration system but can easily be generalized to other data integration systems. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:192 / 208
页数:17
相关论文
共 45 条
[1]   On the independence requirement in Dempster-Shafer theory for combining classifiers providing statistical evidence [J].
Altincay, Hakan .
APPLIED INTELLIGENCE, 2006, 25 (01) :73-90
[2]  
[Anonymous], CAMBRIDGE TXB LINGUI
[3]  
[Anonymous], 2007, Ontology matching, DOI 10.1007/978-3-540-49612-0
[4]  
[Anonymous], 2005, P 2005 ACM SIGMOD IN
[5]  
[Anonymous], SSDBM
[6]  
[Anonymous], LNCS
[7]  
[Anonymous], 2006, J DATA SEMANTICS
[8]  
Banek M, 2008, LECT NOTES COMPUT SC, V5181, P65, DOI 10.1007/978-3-540-85654-2_8
[9]   Synthesizing, an integrated ontology [J].
Beneventano, D ;
Bergamaschi, S ;
Guerra, F ;
Vincini, M .
IEEE INTERNET COMPUTING, 2003, 7 (05) :42-51
[10]  
BENEVENTANO D, 2008, DETAILED DESIGN BUIL, P52