Analyzing data quality issues in research information systems via data profiling

被引:41
作者
Azeroual, Otmane [1 ,2 ]
Saake, Gunter [2 ]
Schallehn, Eike [2 ]
机构
[1] German Ctr Higher Educ Res & Sci Studies DZHW, Schutzenstr 6a, D-10117 Berlin, Germany
[2] Otto von Guericke Univ, Dept Comp Sci, Inst Tech & Business Informat Syst, Database Res Grp, POB 4120, D-39106 Magdeburg, Germany
关键词
Current research information systems; CRIS; Research information systems; RIS; Research information; Data sources; Data quality; Extraction transformation load; ETL; Data analysis; Data profiling; Science system; Standardization;
D O I
10.1016/j.ijinfomgt.2018.02.007
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The success or failure of a RIS in a scientific institution is largely related to the quality of the data available as a basis for the RIS applications. The most beautiful Business Intelligence (BI) tools (reporting, etc.) are worthless when displaying incorrect, incomplete, or inconsistent data. An integral part of every RIS is thus the integration of data from the operative systems. Before starting the integration process (ETL) of a source system, a rich analysis of source data is required. With the support of a data quality check, causes of quality problems can usually be detected. Corresponding analyzes are performed with data profiling to provide a good picture of the state of the data. In this paper, methods of data profiling are presented in order to gain an overview of the quality of the data in the source systems before their integration into the RIS. With the help of data profiling, the scientific institutions can not only evaluate their research information and provide information about their quality, but also examine the dependencies and redundancies between data fields and better correct them within their RIS.
引用
收藏
页码:50 / 56
页数:7
相关论文
共 4 条
[1]  
Apel D., 2015, SUCCESSFULLY CONTROL
[2]  
Azeroual O., 2018, J. Digit. Inf. Manag, V16, P12
[3]  
Azeroual O., 2017, INT J COMPUTER SCI I, V15, P82
[4]  
Olsen J., 2003, Data Quality: The Accuracy Dimension, V5th