Improving XML schema matching performance using Prufer sequences

被引:29
作者
Algergawy, Alsayed [1 ]
Schallehn, Eike [1 ]
Saake, Gunter [1 ]
机构
[1] Otto Von Guericke Univ, Dept Comp Sci, D-39106 Magdeburg, Germany
关键词
XML schema matching; Prufer Sequences; Structural matching; Matching performance;
D O I
10.1016/j.datak.2009.01.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Schema matching is a critical step for discovering semantic correspondences among elements in many data-shared applications. Most of existing schema matching algorithms produce scores between schema elements resulting in discovering only simple matches. Such results partially solve the problem. Identifying and discovering complex matches is considered one of the biggest obstacle towards completely solving the schema matching problem. Another obstacle is the scalability of matching algorithms on large number and large-scale schemas. To tackle these challenges, in this paper, we propose a new XML schema matching framework based on the use of Prufer encoding. in particular, we develop and implement the XPruM system, which consists mainly of two parts-schema preparation and schema matching. First, we parse XML schemas and represent them internally as schema trees. Prufer sequences are constructed for each schema tree and employed to construct a sequence representation of schemas. We capture schema tree semantic information in Label Prufer Sequences (LPS) and schema tree structural information in Number Prufer Sequences (NPS). Then, we develop a new structural matching algorithm exploiting both LPS and NPS. To cope with complex matching discovery, we introduce the concept of compatible nodes to identify semantic correspondences across complex elements first, then the matching process is refined to identify correspondences among simple elements inside each pair of compatible nodes. Our experimental results demonstrate the performance benefits of the XPruM system. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:728 / 747
页数:20
相关论文
共 47 条
[1]  
Abiteboul S., 1999, DATA WEB RELATIONS S
[2]  
ALGERGAWY A, 2008, BALTICD IS2008
[3]  
ALGERGAWY A, 2008, P 18 EUR C ART INT W
[4]  
AMERYAHIA S, 2002, EDBT 02, P89
[5]  
[Anonymous], 2002, P 18 INT C DAT ENG I
[6]  
[Anonymous], 2008, P 11 INT C EXT DAT T, DOI DOI 10.1145/1353343.1353358
[7]  
BERGROTH L, 2004, SPIRE, P39
[8]  
Bondi A. B., 2000, Proceedings Second International Workshop on Software and Performance. WOSP2000, P195, DOI 10.1145/350391.350432
[9]  
BOUKOTTAYA A, 2005, DOCENG 05, P101
[10]  
CARMEL D, 2002, SIGIR FORUM, V36