A Taxonomy of Data Quality Challenges in Empirical Software Engineering

被引:0
|
作者
Bosu, Michael Franklin [1 ]
MacDonell, Stephen G. [1 ]
机构
[1] Auckland Univ Technol, SERL, Sch Comp & Math Sci, Auckland, New Zealand
来源
2013 22ND AUSTRALASIAN CONFERENCE ON SOFTWARE ENGINEERING (ASWEC) | 2013年
关键词
data quality; provenance; commercial sensitivity; accessibility; trustworthiness; empirical software engineering; CLASS NOISE; METRICS; PREDICTION; IMPUTATION; IMPACT; BASE;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling; second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set; and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 50 条
  • [1] Applying empirical software engineering to software architecture: challenges and lessons learned
    Falessi, Davide
    Babar, Muhammad Ali
    Cantone, Giovanni
    Kruchten, Philippe
    EMPIRICAL SOFTWARE ENGINEERING, 2010, 15 (03) : 250 - 276
  • [2] Applying empirical software engineering to software architecture: challenges and lessons learned
    Davide Falessi
    Muhammad Ali Babar
    Giovanni Cantone
    Philippe Kruchten
    Empirical Software Engineering, 2010, 15 : 250 - 276
  • [3] Bayesian Data Analysis in Empirical Software Engineering Research
    Furia, Carlo A.
    Feldt, Robert
    Torkar, Richard
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1786 - 1810
  • [4] Why engineering software is not reusable: empirical data from an experiment
    Di Felice, P
    ADVANCES IN ENGINEERING SOFTWARE, 1998, 29 (02) : 151 - 163
  • [5] Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality
    Furia, Carlo A.
    Torkar, Richard
    Feldt, Robert
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (03)
  • [6] Applying a Data Quality Model to Experiments in Software Engineering
    Carolina Valverde, Maria
    Vallespir, Diego
    Marotta, Adriana
    Ignacio Panach, Jose
    ADVANCES IN CONCEPTUAL MODELING, 2014, 8823 : 168 - 177
  • [7] A Taxonomy of Factors Influencing Data Quality
    Liu, Caihua
    Zowghi, Didar
    Peng, Guochao
    DISTRIBUTED, AMBIENT AND PERVASIVE INTERACTIONS, DAPI 2023, PT I, 2023, 14036 : 328 - 347
  • [8] Predictive Models in Software Engineering: Challenges and Opportunities
    Yang, Yanming
    Xia, Xin
    Lo, David
    Bi, Tingting
    Grundy, John
    Yang, Xiaohu
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (03)
  • [9] A Comprehensive Taxonomy for Prediction Models in Software Engineering
    Yang, Xinli
    Liu, Jingjing
    Zhang, Denghui
    INFORMATION, 2023, 14 (02)
  • [10] Issues in applying empirical software engineering to software architecture
    Falessi, Davide
    Kruchten, Philippe
    Cantone, Giovanni
    SOFTWARE ARCHITECTURE, PROCEEDINGS, 2007, 4758 : 257 - +