Recommendations for analysing and meta-analysing small sample size software engineering experiments

被引：2

作者：

Kitchenham, Barbara ^{[1
]}

Madeyski, Lech ^{[2
]}

机构：

[1] Keele Univ, Sch Comp Sci & Math, Keele ST5 5BG, Staffordshire, England

[2] Wroclaw Univ Sci & Technol, Wyb Wyspianskiego 27, PL-50370 Wroclaw, Poland

来源：

EMPIRICAL SOFTWARE ENGINEERING | 2024年 / 29卷 / 06期

关键词：

Meta-analysis; Effect size; Non-parametric; Probability of superiority; Small sample sizes; Reproducible research; STATISTICS; DIFFERENCE; VARIANCES; RESPECT; TESTS;

D O I：

10.1007/s10664-024-10504-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

ContextSoftware engineering (SE) experiments often have small sample sizes. This can result in data sets with non-normal characteristics, which poses problems as standard parametric meta-analysis, using the standardized mean difference (StdMD) effect size, assumes normally distributed sample data. Small sample sizes and non-normal data set characteristics can also lead to unreliable estimates of parametric effect sizes. Meta-analysis is even more complicated if experiments use complex experimental designs, such as two-group and four-group cross-over designs, which are popular in SE experiments.ObjectiveOur objective was to develop a validated and robust meta-analysis method that can help to address the problems of small sample sizes and complex experimental designs without relying upon data samples being normally distributed.MethodTo illustrate the challenges, we used real SE data sets. We built upon previous research and developed a robust meta-analysis method able to deal with challenges typical for SE experiments. We validated our method via simulations comparing StdMD with two robust alternatives: the probability of superiority (p<^>\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{p}$$\end{document}) and Cliffs' d.ResultsWe confirmed that many SE data sets are small and that small experiments run the risk of exhibiting non-normal properties, which can cause problems for analysing families of experiments. For simulations of individual experiments and meta-analyses of families of experiments, p<^>\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{p}$$\end{document} and Cliff's d consistently outperformed StdMD in terms of negligible small sample bias. They also had better power for log-normal and Laplace samples, although lower power for normal and gamma samples. Tests based on p<^>\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{p}$$\end{document} always had better or equal power than tests based on Cliff's d, and across all but one simulation condition, p<^>\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{p}$$\end{document} Type 1 error rates were less biased.ConclusionsUsing p<^>\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{p}$$\end{document} is a low-risk option for analysing and meta-analysing data from small sample-size SE randomized experiments. Parametric methods are only preferable if you have prior knowledge of the data distribution.

引用

页数：46

共 9 条

[1] Illustrating the importance of meta-analysing variances alongside means in ecology and evolution
Sanchez-Tojar, Alfredo
Moran, Nicholas P.
O'Dea, Rose E.
Reinhold, Klaus
Nakagawa, Shinichi
JOURNAL OF EVOLUTIONARY BIOLOGY, 2020, 33 (09) : 1216 - 1223
[2] Meta-Analysing the Factor Structure and Reliability of Measurement Instruments: An R-Based Tutorial
Escaffi-Schwarz, Maximiliano
Gempp, Rene
Irmer, Julien P.
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2025, 60 (02)
[3] Reducing and meta-analysing estimates from distributed lag non-linear models
Gasparrini, Antonio
Armstrong, Ben
BMC MEDICAL RESEARCH METHODOLOGY, 2013, 13
[4] Meta-analysing the association between social dominance orientation, authoritarianism, and attitudes on the environment and climate change
Stanley, Samantha K.
Wilson, Marc S.
JOURNAL OF ENVIRONMENTAL PSYCHOLOGY, 2019, 61 : 46 - 56
[5] Further developments in summarising and meta-analysing single-case data: An illustration with neurobehavioural interventions in acquired brain injury
Manolov, Rumen
Rochat, Lucien
NEUROPSYCHOLOGICAL REHABILITATION, 2015, 25 (05) : 637 - 662
[6] Applying big data analytics techniques and meta-analysing the impact of cross-border data ows on international trade competitiveness
Wang, Xuanyi
Journal of Combinatorial Mathematics and Combinatorial Computing, 2024, 123 : 421 - 432
[7] Meta-analysis for families of experiments in software engineering: a systematic review and reproducibility and validity assessment
Barbara Kitchenham
Lech Madeyski
Pearl Brereton
Empirical Software Engineering, 2020, 25 : 353 - 401
[8] Meta-analysis for families of experiments in software engineering: a systematic review and reproducibility and validity assessment
Kitchenham, Barbara
Madeyski, Lech
Brereton, Pearl
EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (01) : 353 - 401
[9] A systematic review of effect size in software engineering experiments
Kampenes, Vigdis By
Dyba, Tore
Hannay, Jo E.
Sjoberg, Dag I. K.
INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (11-12) : 1073 - 1086

← 1 →