A Taxonomy of Data Quality Challenges in Empirical Software Engineering

被引:0
|
作者
Bosu, Michael Franklin [1 ]
MacDonell, Stephen G. [1 ]
机构
[1] Auckland Univ Technol, SERL, Sch Comp & Math Sci, Auckland, New Zealand
来源
2013 22ND AUSTRALASIAN CONFERENCE ON SOFTWARE ENGINEERING (ASWEC) | 2013年
关键词
data quality; provenance; commercial sensitivity; accessibility; trustworthiness; empirical software engineering; CLASS NOISE; METRICS; PREDICTION; IMPUTATION; IMPACT; BASE;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling; second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set; and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 50 条
  • [21] Empirical Research in Software Engineering A Literature Survey
    Zhang, Li
    Tian, Jia-Hao
    Jiang, Jing
    Liu, Yi-Jun
    Pu, Meng-Yuan
    Yue, Tao
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2018, 33 (05) : 876 - 899
  • [22] Overview of data quality challenges in the context of Big Data
    Juddoo, Suraj
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS), 2015,
  • [23] An Analysis of the Empirical Software Engineering over the last 10 Editions of Brazilian Software Engineering Symposium
    Monteiro, Davi
    Gadelha, Romulo
    Alencar, Thayse
    Neves, Bruno
    Yeltsin, Italo
    Gomes, Thiago
    Cortes, Mariela
    XXXI BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES 2017), 2017, : 44 - 53
  • [24] Towards Causal Analysis of Empirical Software Engineering Data: The Impact of Programming Languages on Coding Competitions
    Furia, Carlo A.
    Torkar, Richard
    Feldt, Robert
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (01)
  • [25] Revisiting the reproducibility of empirical software engineering studies based on data retrieved from development repositories
    Gonzalez-Barahona, Jesus M.
    Robles, Gregorio
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 164
  • [26] Use of Personality Tests in Empirical Software Engineering Studies A Review of Ethical Issues
    Usman, Muhammad
    Minhas, Nasir Mehmood
    PROCEEDINGS OF EASE 2019 - EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, 2019, : 237 - 242
  • [27] On Gender, Ethnicity, and Culture in Empirical Software Engineering Research
    Gren, Lucas
    2018 IEEE/ACM 11TH INTERNATIONAL WORKSHOP ON COOPERATIVE AND HUMAN ASPECTS OF SOFTWARE ENGINEERING (CHASE), 2018, : 77 - 78
  • [28] Empirical evidence in global software engineering: a systematic review
    Smite, Darja
    Wohlin, Claes
    Gorschek, Tony
    Feldt, Robert
    EMPIRICAL SOFTWARE ENGINEERING, 2010, 15 (01) : 91 - 118
  • [29] Empirical study on software engineering knowledge/experience packages
    Ardimento, Pasquale
    Cimitile, Marta
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROCEEDINGS, 2008, 5089 : 289 - 303
  • [30] Bayesian analysis of empirical software engineering cost models
    Chulani, S
    Boehm, B
    Steece, B
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1999, 25 (04) : 573 - 583