ETL Best Practices for Data Quality Checks in RIS Databases

被引:13
作者
Azeroual, Otmane [1 ,2 ,3 ]
Saake, Gunter [2 ]
Abuosba, Mohammad [3 ]
机构
[1] German Ctr Higher Educ Res & Sci Studies DZHW, Schutzenstr 6a, D-10117 Berlin, Germany
[2] Otto von Guericke Univ, Inst Tech & Business Informat Syst, Database Res Grp, Univ Pl 2, D-39106 Magdeburg, Germany
[3] Univ Appl Sci HTW Berlin, Dept Comp Sci & Engn, Wilhelminenhofstr 75 A, D-12459 Berlin, Germany
来源
INFORMATICS-BASEL | 2019年 / 6卷 / 01期
关键词
research information systems (RIS); heterogeneous information sources; metadata; data integration; data transformation; extraction transformation load (ETL) technology; data quality; RESEARCH INFORMATION-SYSTEMS; METRICS; IMPACT;
D O I
10.3390/informatics6010010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution's internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.
引用
收藏
页数:13
相关论文
共 25 条
[1]  
Azeroual O., 2018, International Journal of Management Science and Business Administration, V4, P30, DOI [10.18775/ijmsba.1849-5664-5419.2014.43.1004, DOI 10.18775/IJMSBA.1849-5664-5419.2014.43.1004]
[2]  
Azeroual O., 2019, INF SER USE
[3]  
Azeroual O., 2018, J. Digit. Inf. Manag, V16, P12
[4]   Quality Issues of CRIS Data: An Exploratory Investigation with Universities from Twelve Countries [J].
Azeroual, Otmane ;
Schopfel, Joachim .
PUBLICATIONS, 2019, 7 (01)
[5]   Data measurement in research information systems: metrics for the evaluation of data quality [J].
Azeroual, Otmane ;
Saake, Gunter ;
Wastl, Jurgen .
SCIENTOMETRICS, 2018, 115 (03) :1271-1290
[6]   Analyzing data quality issues in research information systems via data profiling [J].
Azeroual, Otmane ;
Saake, Gunter ;
Schallehn, Eike .
INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2018, 41 :50-56
[7]  
Berkhoff K., 2012, P CRIS2012 11 INT C
[8]  
Helmis S, 2009, WEBBASIERTE DATENINT
[9]  
Jeffery K.G., 2004, P CRIS2004 7 INT C C
[10]  
Jorg B., 2012, SYST APPR KNOWL MAN