1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite

被引:69
作者
Fraser, Gordon [1 ]
Arcuri, Andrea [2 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Simula Res Lab, Certus Software V&V Ctr, Lysaker, Norway
关键词
Search-based testing; Automated test generation; Test oracles; TEST-DATA GENERATION;
D O I
10.1007/s10664-013-9288-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated unit test generation techniques traditionally follow one of two goals: Either they try to find violations of automated oracles (e.g., assertions, contracts, undeclared exceptions), or they aim to produce representative test suites (e.g., satisfying branch coverage) such that a developer can manually add test oracles. Search-based testing (SBST) has delivered promising results when it comes to achieving coverage, yet the use in conjunction with automated oracles has hardly been explored, and is generally hampered as SBST does not scale well when there are too many testing targets. In this paper we present a search-based approach to handle both objectives at the same time, implemented in the EvoSuite tool. An empirical study applying EvoSuite on 100 randomly selected open source software projects (the SF100 corpus) reveals that SBST has the unique advantage of being well suited to perform both traditional goals at the same time-efficiently triggering faults, while producing representative test sets for any chosen coverage criterion. In our study, EvoSuite detected twice as many failures in terms of undeclared exceptions as a traditional random testing approach, witnessing thousands of real faults in the 100 open source projects. Two out of every five classes with undeclared exceptions have actual faults, but these are buried within many failures that are caused by implicit preconditions. This "noise" can be interpreted as either a call for further research in improving automated oracles-or to make tools like EvoSuite an integral part of software development to enforce clean program interfaces.
引用
收藏
页码:611 / 639
页数:29
相关论文
共 46 条
[11]   ON THE ORIGINS OF THE .05 LEVEL OF STATISTICAL SIGNIFICANCE [J].
COWLES, M ;
DAVIS, C .
AMERICAN PSYCHOLOGIST, 1982, 37 (05) :553-558
[12]   JCrasher: an automatic robustness tester for Java']Java [J].
Csallner, C ;
Smaragdakis, Y .
SOFTWARE-PRACTICE & EXPERIENCE, 2004, 34 (11) :1025-1050
[13]   Detecting buffer overflow via automatic test input data generation [J].
Del Grosso, C. ;
Antoniol, G. ;
Merlo, E. ;
Galinier, P. .
COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (10) :3125-3143
[14]   AN EVALUATION OF RANDOM TESTING [J].
DURAN, JW ;
NTAFOS, SC .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1984, 10 (04) :438-444
[15]  
Fraser G., 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST 2012), P121, DOI 10.1109/ICST.2012.92
[16]  
Fraser G., 2011, Proceedings 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation (ICST 2011), P150, DOI 10.1109/ICST.2011.54
[17]  
Fraser G., 2011, P 19 ACM SIGSOFT S 1, P416
[18]  
Fraser G, 2013, IEEE INT C SOFTW TES
[19]  
Fraser G, 2013, GEN EV COMP C GECCO
[20]   Whole Test Suite Generation [J].
Fraser, Gordon ;
Arcuri, Andrea .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (02) :276-291