Usage-based schema matching

被引:23
作者
Elmeleegy, Hazem [1 ]
Ouzzani, Mourad [2 ]
Elmagarmid, Ahmed [1 ,2 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Purdue Univ, Cyber Ctr, W Lafayette, IN 47907 USA
来源
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2008年
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICDE.2008.4497410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we define a new class of techniques, called usage-based schema matching. The idea is to exploit information extracted from the query logs to rind correspondences between attributes in the schemas to be matched. We propose methods to identify co-occurrence patterns between attributes in addition to other features such as their use in joins and with aggregate functions. Several scoring functions are considered to measure the similarity of the extracted features, and a genetic algorithm is employed to rind the highest-score mappings between the two schemas. Our technique is suitable for matching schemas even when their attribute names are opaque. It can further be combined with existing techniques to obtain more accurate results. Our experimental study demonstrates the effectiveness of the proposed approach and the benefit of combining it with other existing approaches.
引用
收藏
页码:20 / +
页数:2
相关论文
共 17 条
[1]  
AN Y, 2007, ICDE
[2]  
BOHANNON P, 2006, VLDB
[3]  
Bruno N., 2005, SIGMOD
[4]  
Dageville B, 2004, VLDB
[5]  
DHAMANKAR R, 2004, SIGMOD
[6]  
DOAN A, 2001, SIGMOD
[7]  
HAAS L, 2005, SIGMOD
[8]  
HE B, 2003, SIGMOD
[9]  
Hong-Hai Do, 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P610
[10]  
KANG J, 2003, SIGMOD