Evaluating logistic regression models to estimate software project outcomes

被引:32
作者
Cerpa, Narciso [1 ]
Bardeen, Matthew [1 ]
Kitchenham, Barbara [2 ]
Verner, June [3 ]
机构
[1] Univ Talca, Fac Ingn, Curico, Chile
[2] Univ Keele, Sch Comp & Math, Keele ST5 5BG, Staffs, England
[3] Univ New S Wales, Sydney, NSW, Australia
关键词
Project outcome; Tailored cut-off; ROC analysis; Single-company data; Cross-company data; Classifier evaluation; SUCCESS; PRACTITIONERS; PERCEPTIONS; PREDICTION; ACCEPTANCE; THINK; EASE;
D O I
10.1016/j.infsof.2010.03.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Software has been developed since the 1960s but the success rate of software development projects is still low. During the development of software, the probability of success is affected by various practices or aspects. To date, it is not clear which of these aspects are more important in influencing project outcome. Objective: In this research, we identify aspects which could influence project success, build prediction models based on the aspects using data collected from multiple companies, and then test their performance on data from a single organization. Method: A survey-based empirical investigation was used to examine variables and factors that contribute to project outcome. Variables that were highly correlated to project success were selected and the set of variables was reduced to three factors by using principal components analysis. A logistic regression model was built for both the set of variables and the set of factors, using heterogeneous data collected from two different countries and a variety of organizations. We tested these models by using a homogeneous hold-out dataset from one organization. We used the receiver operating characteristic (ROC) analysis to compare the performance of the variable and factor-based models when applied to the homogeneous dataset. Results: We found that using raw variables or factors in the logistic regression models did not make any significant difference in predictive capability. The prediction accuracy of these models is more balanced when the cut-off is set to the ratio of success to failures in the datasets used to build the models. We found that the raw variable and factor-based models predict significantly better than random chance. Conclusion: We conclude that an organization wishing to estimate whether a project will succeed or fail may use a model created from heterogeneous data derived from multiple organizations. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:934 / 944
页数:11
相关论文
共 67 条
[1]  
Abe S., 2006, 28th International Conference on Software Engineering Proceedings, P600, DOI 10.1145/1134285.1134371
[2]  
[Anonymous], STAT ANAL INTERDISCI
[3]  
[Anonymous], 1975, MYTHICAL MAN MONTH E
[4]  
[Anonymous], 1981, Software Engineering Economics
[5]  
Baccarini David., 1999, Project Management Journal, V30, P25, DOI [10.1177/8756972899030004, DOI 10.1177/8756972899030004]
[6]  
Bennatan E., 2000, TIME BUDGET
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]  
Briand L. C., 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium, P377, DOI 10.1109/ICSE.2000.870428
[9]  
Brito JC, 1999, ECOGRAPHY, V22, P251
[10]   Why Did Your Project Fail? [J].
Cerpa, Narciso ;
Verner, June M. .
COMMUNICATIONS OF THE ACM, 2009, 52 (12) :130-134