A practical guide to controlled experiments of software engineering tools with human participants

被引:130
作者
Ko, Andrew J. [1 ]
LaToza, Thomas D. [2 ]
Burnett, Margaret M. [3 ]
机构
[1] Univ Washington, Informat Sch, Seattle, WA 98195 USA
[2] Univ Calif Irvine, Dept Informat, Irvine, CA USA
[3] Oregon State Univ, Sch Elect Engn & Comp Sci, Corvallis, OR 97331 USA
基金
美国国家科学基金会;
关键词
Research methodology; Tools; Human participants; Human subjects; Experiments; COMPUTER-SCIENCE; EFFECT SIZE; IMPACT;
D O I
10.1007/s10664-013-9279-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Empirical studies, often in the form of controlled experiments, have been widely adopted in software engineering research as a way to evaluate the merits of new software engineering tools. However, controlled experiments involving human participants actually using new tools are still rare, and when they are conducted, some have serious validity concerns. Recent research has also shown that many software engineering researchers view this form of tool evaluation as too risky and too difficult to conduct, as they might ultimately lead to inconclusive or negative results. In this paper, we aim both to help researchers minimize the risks of this form of tool evaluation, and to increase their quality, by offering practical methodological guidance on designing and running controlled experiments with developers. Our guidance fills gaps in the empirical literature by explaining, from a practical perspective, options in the recruitment and selection of human participants, informed consent, experimental procedures, demographic measurements, group assignment, training, the selecting and design of tasks, the measurement of common outcome variables such as success and time on task, and study debriefing. Throughout, we situate this guidance in the results of a new systematic review of the tool evaluations that were published in over 1,700 software engineering papers published from 2001 to 2011.
引用
收藏
页码:110 / 141
页数:32
相关论文
共 85 条
[1]  
ANDERSON JR, 1985, BYTE, V10, P159
[2]  
[Anonymous], 2010, Handbook of Inter-Rater Reliability. The Definitive Guide to Measuring the Extent of Agreement Among Raters
[3]  
[Anonymous], 2007, ESSENTIALS BEHAV RES
[4]  
[Anonymous], 2010, ACM CHI C HUM FACT C, DOI DOI 10.1145/1753846.1753873
[5]  
[Anonymous], 1973, VISUAL INFORM PROCES
[6]  
[Anonymous], 2010, Research Methods in HumanComputer Interaction
[7]  
[Anonymous], 2012, P SIGCHI C HUMAN FAC
[8]  
[Anonymous], 2005, Proc. SIGCHI
[9]  
[Anonymous], 1994, Encyclopedia of Software Engineering
[10]  
Aranda J, 2011, DO PRACTITIONERS PER