Appearance-order-based schema matching

被引:1
作者
Ding, Guohui [1 ]
Cao, Keyan [2 ]
Wang, Guoren [2 ,3 ]
Han, Dong [4 ]
机构
[1] Department of Computer Science, Shenyang Aerospace University, Shenyang
[2] Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang
[3] College of Information Science and Engineering, Northeastern University, Shenyang
[4] National Marine Data and Information Service, Tianjin
基金
中国国家自然科学基金;
关键词
Appearance order; Attributes; Correspondences; Data integration; Schema matching; Similarity;
D O I
10.5626/JCSE.2014.8.2.94
中图分类号
学科分类号
摘要
Schema matching is widely used in many applications, such as data integration, ontology merging, data warehouse and dataspaces. In this paper, we propose a novel matching technique that is based on the order of attributes appearing in the schema structure of query results. The appearance order embodies the extent of the importance of an attribute for the user examining the query results. The core idea of our approach is to collect statistics about the appearance order of attributes from the query logs, to find correspondences between attributes in the schemas to be matched. As a first step, we employ a matrix to structure the statistics around the appearance order of attributes. Then, two scoring functions are considered to measure the similarity of the collected statistics. Finally, a traditional algorithm is employed to find the mapping with the highest score. Furthermore, our approach can be seen as a complementary member to the family of the existing matchers, and can also be combined with them to obtain more accurate results. We validate our approach with an experimental study, the results of which demonstrate that our approach is effective, and has good performance. © 2014. The Korean Institute of Information Scientists and Engineers.
引用
收藏
页码:94 / 106
页数:12
相关论文
共 15 条
[1]  
Rahm E., Bernstein P.A., A survey of approaches to automatic schema matching, VLDB Journal, 10, 4, pp. 334-350, (2001)
[2]  
Kang J., Naughton J.F., On schema matching with opaque column names and data values, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 205-216, (2003)
[3]  
Madhavan J., Bernstein P.A., Doan A., Halevy A., Corpus-based schema matching, Proceedings of the 21st International Conference on Data Engineering, pp. 57-68, (2005)
[4]  
Dong X., Halevy A.Y., Yu C., Data integration with uncertainty, Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 687-698, (2007)
[5]  
Elmeleegy H., Ouzzani M., Elmagarmid A., Usagebased schema matching, Proceedings of the 24th International Conference on Data Engineering, pp. 20-29, (2008)
[6]  
Kirkpatrick S., Vecchi M.P., Optimization by simmulated annealing, Science, 220, 4598, pp. 671-680, (1983)
[7]  
Bohannon P., Elnahrawy E., Fan W., Flaster M., Putting context into schema matching, Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 307-318, (2006)
[8]  
Popa L., Velegrakis Y., Hernandez M.A., Miller R.J., Fagin R., Translating web data, Proceedings of the 28th International Conference on Very Large Data Bases, pp. 598-609, (2002)
[9]  
Miller R.J., Haas L.M., Hernandez M.A., Schema mapping as query discovery, Proceedings of the 26th International Conference on Very Large Data Bases, pp. 77-88, (2000)
[10]  
An Y., Borgida A., Miller R.J., Mylopoulos J., A semantic approach to discovering schema mapping expressions, Proceedings of the 23rd International Conference on Data Engineering, pp. 206-215, (2007)