Software effort models should be assessed via leave-one-out validation

被引：109

作者：

Kocaguneli, Ekrem ^{[1
]}

Menzies, Tim ^{[1
]}

机构：

[1] W Virginia Univ, CSEE, Morgantown, WV 26506 USA

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2013年 / 86卷 / 07期

关键词：

Software cost estimation; Prediction system; Bias; Variance; COST ESTIMATION; PREDICTION; SELECTION; SYSTEMS;

D O I：

10.1016/j.jss.2013.02.053

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Context: More than half the literature on software effort estimation (SEE) focuses on model comparisons. Each of those requires a sampling method (SM) to generate the train and test sets. Different authors use different SMs such as leave-one-out (LOO), 3Way and 10Way cross-validation. While LOO is a deterministic algorithm, the N-way methods use random selection to build their train and test sets. This introduces the problem of conclusion instability where different authors rank effort estimators in different ways. Objective: To reduce conclusion instability by removing the effects of a sampling method's random test case generation. Method: Calculate bias and variance (B&V) values following the assumption that a learner trained on the whole dataset is taken as the true model; then demonstrate that the MAT and runtime values for LOO are similar to N-way by running 90 different algorithms on 20 different SEE datasets. For each algorithm, collect runtimes, B&V values under LOO, 3Way and 10Way. Results: We observed that: (1) the majority of the algorithms have statistically indistinguishable B&V values under different SMs and (2) different SMs have similar run times. Conclusion: In terms of their generated B&V values and runtimes, there is no reason to prefer N-way over LOO. In terms of reproducibility, LOO removes one cause of conclusion instability (the random selection of train and test sets). Therefore, we depreciate N-way and endorse LOO validation for assessing effort models. (C) 2013 Elsevier Inc. All rights reserved.

引用

页码：1879 / 1890

页数：12

共 47 条

[1] SOFTWARE FUNCTION, SOURCE LINES OF CODE, AND DEVELOPMENT EFFORT PREDICTION - A SOFTWARE SCIENCE VALIDATION [J].

ALBRECHT, AJ ;

GAFFNEY, JE .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1983, 9 (06) :639-648

[2]

[Anonymous], 2004, Introduction to Machine Learning

[3]

[Anonymous], AIPR 09

[4]

[Anonymous], IEEE T SOFTWARE ENG

[5]

[Anonymous], P ENG INT SYST

[6]

[Anonymous], P JOINT C INT SOC PA

[7]

[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946

[8]

[Anonymous], THESIS W VIRGINIA U

[9]

[Anonymous], 2002, Applied Statistics for Software Managers

[10]

[Anonymous], 1987, Multiple comparison procedures

← 1 2 3 4 5 →