How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases

被引:3
作者
Azeroual, Otmane [1 ]
Lewoniewski, Wlodzimierz [2 ]
机构
[1] German Ctr Higher Educ Res & Sci Studies DZHW, D-10117 Berlin, Germany
[2] Poznan Univ Econ & Business, Dept Informat Syst, PL-61875 Poznan, Poland
关键词
Wikipedia; current research information systems (CRIS); publications data; data quality; objective quality dimensions; research data processing; data management; data analysis; data measurement; completeness; consistency; correctness; timeliness; efficient decision-making; RESEARCH INFORMATION-SYSTEMS; METRICS;
D O I
10.3390/a13050107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.
引用
收藏
页数:18
相关论文
共 31 条
[1]  
[Anonymous], 2007, ARXIV07052106
[2]  
Azeroual M., 2017, INT J COMPUTER SCI A, Vl5, P82
[3]  
Azeroual O., 2018, J DIGITAL INFORM MAN, V16, P12
[4]   ETL Best Practices for Data Quality Checks in RIS Databases [J].
Azeroual, Otmane ;
Saake, Gunter ;
Abuosba, Mohammad .
INFORMATICS-BASEL, 2019, 6 (01)
[5]   Quality of Research Information in RIS Databases: A Multidimensional Approach [J].
Azeroual, Otmane ;
Saake, Gunter ;
Abuosba, Mohammad ;
Schopfel, Joachim .
BUSINESS INFORMATION SYSTEMS, PT I, 2019, 353 :337-349
[6]   Quality Issues of CRIS Data: An Exploratory Investigation with Universities from Twelve Countries [J].
Azeroual, Otmane ;
Schopfel, Joachim .
PUBLICATIONS, 2019, 7 (01)
[7]   Data measurement in research information systems: metrics for the evaluation of data quality [J].
Azeroual, Otmane ;
Saake, Gunter ;
Wastl, Jurgen .
SCIENTOMETRICS, 2018, 115 (03) :1271-1290
[8]   Analyzing data quality issues in research information systems via data profiling [J].
Azeroual, Otmane ;
Saake, Gunter ;
Schallehn, Eike .
INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2018, 41 :50-56
[9]  
Batini Carlo, 2016, Data and Information Quality: Dimensions, Principles and Techniques
[10]   International Data on Measuring Management Practices [J].
Bloom, Nicholas ;
Lemos, Renata ;
Sadun, Raffaella ;
Scur, Daniela ;
Van Reenen, John .
AMERICAN ECONOMIC REVIEW, 2016, 106 (05) :152-156