A Methodology and Architecture Embedding Quality Assessment in Data Integration

被引:9
作者
Martin, Nigel [1 ]
Poulovassilis, Alexandra [1 ]
Wang, Jianing [1 ]
机构
[1] Birkbeck Univ London, Dept Comp Sci & Informat Syst, Malet St, London, England
来源
ACM JOURNAL OF DATA AND INFORMATION QUALITY | 2014年 / 4卷 / 04期
关键词
Design; Measurement; Data integration; data quality; data quality assessment;
D O I
10.1145/2567663
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data integration aims to combine heterogeneous information sources and to provide interfaces for accessing the integrated resource. Data integration is a collaborative task that may involve many people with different degrees of experience, knowledge of the application domain, and expectations relating to the integrated resource. It may be difficult to determine and control the quality of an integrated resource due to these factors. In this article, we propose a data integration methodology that has embedded within it iterative quality assessment and improvement of the integrated resource. We also propose an architecture for the realisation of this methodology. The quality assessment is based on an ontology representation of different users' quality requirements and of the main elements of the integrated resource. We use description logic as the formal basis for reasoning about users' quality requirements and for validating that an integrated resource satisfies these requirements. We define quality factors and associated metrics which enable the quality of alternative global schemas for an integrated resource to be assessed quantitively, and hence the improvement which results from the refinement of a global schema following our methodology to be measured. We evaluate our approach through a large-scale real-life case study in biological data integration in which an integrated resource is constructed from three autononous proteomics data sources.
引用
收藏
页数:40
相关论文
共 56 条
[1]  
Aumueller D., 2005, P 2005 ACM SIGMOD IN, P906, DOI 10.1145/1066157.1066283
[2]  
Baader F., 2003, DESCRIPTION LOGIC HD
[3]  
BATINI C, 1986, COMPUT SURV, V18, P323, DOI 10.1145/27633.27634
[4]  
Batini C., 2006, DATA QUALITY CONCEPT
[5]  
Batista M.d.C.M., 2007, QDB, P61
[6]  
Belhajjame K., 2010, EDBT, P573
[7]  
Belhajjame K., 2011, CIDR, P175
[8]  
Bernstein P. A., 2007, P ACM SIGMOD INT C M, P1, DOI DOI 10.1145/1247480.1247482
[9]  
BONIFATI A, 2005, VLDB, P1267
[10]  
Buneman P., 1994, SIGMOD Record, V23, P87, DOI 10.1145/181550.181564