Local versus Global Lessons for Defect Prediction and Effort Estimation

被引:168
作者
Menzies, Tim [1 ]
Butcher, Andrew [1 ]
Cok, David [2 ]
Marcus, Andrian [3 ]
Layman, Lucas [4 ]
Shull, Forrest [4 ]
Turhan, Burak [5 ]
Zimmermann, Thomas [6 ]
机构
[1] W Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
[2] GrammaTech, Technol, Ithaca, NY USA
[3] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[4] Univ Maryland, Fraunhofer Ctr, College Pk, MD 20742 USA
[5] Univ Oulu, Dept Informat Proc Sci, Oulu, Finland
[6] Microsoft Res, Res Software Engn Grp, Redmond, WA USA
基金
芬兰科学院; 美国国家科学基金会;
关键词
Data mining; clustering; defect prediction; effort estimation; ORIENTED DESIGN METRICS; PRONE CLASSES; EMPIRICAL VALIDATION; SOFTWARE METRICS; IDENTIFICATION; COMPLEXITY; QUALITY; CODE;
D O I
10.1109/TSE.2012.83
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing research is unclear on how to generate lessons learned for defect prediction and effort estimation. Should we seek lessons that are global to multiple projects or just local to particular projects? This paper aims to comparatively evaluate local versus global lessons learned for effort estimation and defect prediction. We applied automated clustering tools to effort and defect datasets from the PROMISE repository. Rule learners generated lessons learned from all the data, from local projects, or just from each cluster. The results indicate that the lessons learned after combining small parts of different data sources (i.e., the clusters) were superior to either generalizations formed over all the data or local lessons formed from particular projects. We conclude that when researchers attempt to draw lessons from some historical data source, they should 1) ignore any existing local divisions into multiple sources, 2) cluster across all available data, then 3) restrict the learning of lessons to the clusters from other sources that are nearest to the test data.
引用
收藏
页码:822 / 834
页数:13
相关论文
共 71 条
[1]   Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: A replicated case study [J].
Aggarwal, K.K. ;
Singh, Yogesh ;
Kaur, Arvinder ;
Malhotra, Ruchika .
Software Process Improvement and Practice, 2009, 14 (01) :39-62
[2]  
[Anonymous], 2005, P INT S EMP SOFTW EN
[3]  
[Anonymous], 1992, C4 5 PROGRAMS MACHIN
[4]  
[Anonymous], 2005, Data mining: Practical machine learning tools and techniques
[5]  
[Anonymous], P 7 INT C PRED MOD S
[6]  
[Anonymous], THESIS W VIRGINIA U
[7]  
[Anonymous], 1981, Software Engineering Economics
[8]  
Arisholm E., 2006, ISESE 06 P 2006 ACMI, P8
[9]   A validation of object-oriented design metrics as quality indicators [J].
Basili, VR ;
Briand, LC ;
Melo, WL .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (10) :751-761
[10]  
BAY S, 1999, P 5 INT C KNOWL DISC