Measuring data quality in information systems research

被引:26
作者
Timmerman, Yoram [1 ]
Bronselaer, Antoon [1 ]
机构
[1] Univ Ghent, Dept Telecommun & Informat Proc, Sint Pietersnieuwstr 41, B-9000 Ghent, Belgium
关键词
Data quality; Rule-based measurement; Information systems; Uncertainty modelling; MODEL;
D O I
10.1016/j.dss.2019.113138
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although contemporary research relies to a large extent on data, data quality in Information Systems research is a subject that has not received much attention until now. In this paper, a framework is presented for the measurement of scientific data quality using the principles of rule-based measurement. The proposed framework is capable of handling data quality problems due to both incorrect execution and incorrect description of data collection and validation processes. It is then argued that uncertainty can arise during the measurement, which complicates data quality assessment. The framework is therefore extended to handle uncertainty about the truth value of predicates. Instead of a numerical quality level, data quality is then expressed as either a probability distribution or a possibility distribution over the ordinal quality scale. Finally, it is also shown how quality thresholds can be formulated based on the results of the quality measurement. The usefulness of the proposed framework is illustrated throughout the paper with an example of the construction of a possible survey data quality measurement system and, subsequently, the application of that system on a realistic example.
引用
收藏
页数:7
相关论文
共 27 条
[1]  
[Anonymous], FUZZY SET THEORY ADV
[2]  
[Anonymous], 2013, PROBABILITY STAT PEA
[3]   A DEFINITION OF SUBJECTIVE-PROBABILITY [J].
ANSCOMBE, FJ ;
AUMANN, RJ .
ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (01) :199-&
[4]   Moving from data-constrained to data-enabled research: Experiences and challenges in collecting, validating and analyzing large-scale e-commerce data [J].
Bapna, Ravi ;
Goes, Paulo ;
Gopal, Ram ;
Marsden, James R. .
STATISTICAL SCIENCE, 2006, 21 (02) :116-130
[5]   Methodologies for Data Quality Assessment and Improvement [J].
Batini, Carlo ;
Cappiello, Cinzia ;
Francalanci, Chiara ;
Maurino, Andrea .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[6]   Mode of questionnaire administration can have serious effects on data quality [J].
Bowling, A .
JOURNAL OF PUBLIC HEALTH, 2005, 27 (03) :281-291
[7]   An incremental approach for data quality measurement with insufficient information [J].
Bronselaer, A. ;
Nielandt, J. ;
De Tre, G. .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 96 :95-111
[8]   A Measure-Theoretic Foundation for Data Quality [J].
Bronselaer, Antoon ;
De Mol, Robin ;
De Tre, Guy .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (02) :627-639
[9]   A Possibilistic Treatment of Data Quality Measurement [J].
Bronselaer, Antoon ;
De Tre, Guy .
INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, IPMU 2016, PT II, 2016, 611 :367-378
[10]  
CODD EF, 1970, COMMUN ACM, V13, P377, DOI [10.1145/362384.362685, 10.1145/357980.358007]