A Taxonomy of Data Quality Challenges in Empirical Software Engineering

被引:0
|
作者
Bosu, Michael Franklin [1 ]
MacDonell, Stephen G. [1 ]
机构
[1] Auckland Univ Technol, SERL, Sch Comp & Math Sci, Auckland, New Zealand
来源
2013 22ND AUSTRALASIAN CONFERENCE ON SOFTWARE ENGINEERING (ASWEC) | 2013年
关键词
data quality; provenance; commercial sensitivity; accessibility; trustworthiness; empirical software engineering; CLASS NOISE; METRICS; PREDICTION; IMPUTATION; IMPACT; BASE;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling; second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set; and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 50 条
  • [41] Using Text Mining For Research Trends in Empirical Software Engineering
    Tokdemir, Gul
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2021, 24 (03): : 1227 - 1235
  • [42] Towards an Extensible Architecture for an Empirical Software Engineering Computational Platform
    Silveira, Fabio Fagundes
    Avancini, Rodrigo
    Franca, David de Souza
    Guerra, Eduardo Martins
    da Silva, Tiago Silva
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT IX, 2021, 12957 : 231 - 246
  • [43] An Empirical Analysis of Newcomers' Contributions to Software-Engineering Conferences
    Alchokr, Rand
    Krueger, Jacob
    Shakeel, Yusra
    Saake, Gunter
    Leich, Thomas
    LEVERAGING GENERATIVE INTELLIGENCE IN DIGITAL LIBRARIES: TOWARDS HUMAN-MACHINE COLLABORATION, ICADL 2023, PT I, 2023, 14457 : 231 - 247
  • [44] Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation
    Bosu, Michael F.
    Macdonell, Stephen G.
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2019, 11 (04):
  • [45] Conducting empirical software engineering research in Nigeria: The posing problems
    Olalekan, AS
    ICSE 05: 27th International Conference on Software Engineering, Proceedings, 2005, : 633 - 634
  • [46] Grand challenges in altmetrics: heterogeneity, data quality and dependencies
    Haustein, Stefanie
    SCIENTOMETRICS, 2016, 108 (01) : 413 - 423
  • [47] Combining Quantitative and Qualitative Studies in Empirical Software Engineering Research
    Di Penta, Massimiliano
    Tamburri, Damian Andrew
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 499 - 500
  • [48] Introduction to the special section on Enhancing Credibility of Empirical Software Engineering
    Madeyski, Lech
    Kitchenham, Barbara
    Wnuk, Krzysztof
    INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 99 : 118 - 119
  • [49] An empirical study of software architectures' effect on product quality
    Hansen, Klaus Marius
    Jonasson, Kristjan
    Neukirchen, Helmut
    JOURNAL OF SYSTEMS AND SOFTWARE, 2011, 84 (07) : 1233 - 1243
  • [50] Challenges to the quality of data-quality measures
    Jacobs, C. G., Jr.
    FOOD CHEMISTRY, 2009, 113 (03) : 754 - 758