A Taxonomy of Data Quality Challenges in Empirical Software Engineering

被引:0
|
作者
Bosu, Michael Franklin [1 ]
MacDonell, Stephen G. [1 ]
机构
[1] Auckland Univ Technol, SERL, Sch Comp & Math Sci, Auckland, New Zealand
来源
2013 22ND AUSTRALASIAN CONFERENCE ON SOFTWARE ENGINEERING (ASWEC) | 2013年
关键词
data quality; provenance; commercial sensitivity; accessibility; trustworthiness; empirical software engineering; CLASS NOISE; METRICS; PREDICTION; IMPUTATION; IMPACT; BASE;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling; second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set; and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 50 条
  • [31] An Empirical Study of Practitioners' Perspectives on Green Software Engineering
    Manotas, Irene
    Bird, Christian
    Zhang, Rui
    Shepherd, David
    Jaspan, Ciera
    Sadowski, Caitlin
    Pollock, Lori
    Clause, James
    2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 237 - 248
  • [32] How Reliable Are Systematic Reviews in Empirical Software Engineering?
    MacDonell, Stephen
    Shepperd, Martin
    Kitchenham, Barbara
    Mendes, Emilia
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2010, 36 (05) : 676 - 687
  • [33] Unmasking Data Secrets: An Empirical Investigation into Data Smells and Their Impact on Data Quality
    Recupito, Gilberto
    Rapacciuolo, Raimondo
    Di Nucci, Dario
    Palomba, Fabio
    PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 53 - 63
  • [34] Five Days of Empirical Software Engineering: The PASED Experience
    Di Penta, Massimiliano
    Antoniol, Giuliano
    German, Daniel M.
    Gueheneuc, Yann-Gael
    Adams, Bram
    2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2012, : 1255 - 1258
  • [36] A multilevel analysis of data quality for formal software citation
    Schindler, David
    Hossain, Tazin
    Spors, Sascha
    Krueger, Frank
    QUANTITATIVE SCIENCE STUDIES, 2024, 5 (03): : 637 - 667
  • [37] An Empirical Study of Dynamic Incomplete-case Nearest Neighbor Imputation in Software Quality Data
    Huang, Jianglin
    Sun, Hongyi
    Li, Yan-Fu
    Xie, Min
    2015 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND RELIABILITY (QRS 2015), 2015, : 37 - 42
  • [38] Challenges for Data Quality in the Clinical Data Life Cycle: Systematic Review
    An, Doyeon
    Lim, Minsik
    Lee, Suehyun
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27
  • [39] Software Engineering Fresh Graduates Career Choices and Software Industry Demands: An Empirical Analysis
    Qamar, Nosheen
    Kiran, Hafiza Maria
    Ahmad, Fatima
    Saeed, Shehla
    Abid, Beenish
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (12): : 72 - 78
  • [40] Replication of Empirical Studies in Software Engineering: An Update of a Systematic Mapping
    Bezerra, Roberta M. M.
    da Silva, Fabio Q. B.
    Santana, Anderson M.
    Magalhaes, Cleyton V. C.
    Santos, Ronnie E. S.
    2015 ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM), 2015, : 132 - 135