Tuning for software analytics: Is it really necessary?

被引:143
作者
Fu, Wei [1 ]
Menzies, Tim [1 ]
Shen, Xipeng [1 ]
机构
[1] N Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
基金
美国国家科学基金会;
关键词
Defect prediction; CART; Random forest; Differential evolution; Search-based software engineering; DEFECT PREDICTION; DIFFERENTIAL EVOLUTION; SEARCH; OPTIMIZATION;
D O I
10.1016/j.infsof.2016.04.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the "black arts" of data mining is setting the tunings that control the miner. Objective: We seek simple, automatic, and very effective method for finding those tunings. Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data. Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%. Conclusion: Since (1) the improvements are so large, and (2) the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue. (c) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:135 / 146
页数:12
相关论文
共 63 条
[1]   Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System [J].
Anda, Bente C. D. ;
Sjoberg, Dag I. K. ;
Mockus, Audris .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2009, 35 (03) :407-429
[2]  
[Anonymous], 2007, 3 INT WORKSH PRED MO
[3]  
[Anonymous], P RAISE 12
[4]  
Arisholm E., 2006, ISESE 06
[5]  
BAKER DML, 2007, THESIS
[6]   MOSS multiobjective scatter search applied to non-linear multiple criteria optimization [J].
Beausoleil, RP .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 169 (02) :426-449
[7]   The limited impact of individual developer data on software defect prediction [J].
Bell, Robert M. ;
Ostrand, Thomas J. ;
Weyuker, Elaine J. .
EMPIRICAL SOFTWARE ENGINEERING, 2013, 18 (03) :478-505
[8]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[9]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32