Are Mutants a Valid Substitute for Real Faults in Software Testing?

被引：419

作者：

Just, Rene ^{[1
]}

Jalali, Darioush ^{[1
]}

Inozemtseva, Laura ^{[2
]}

Ernst, Michael D. ^{[1
]}

Holmes, Reid ^{[2
]}

Fraser, Gordon ^{[3
]}

机构：

[1] Univ Washington, Seattle, WA 98195 USA

[2] Univ Waterloo, Waterloo, ON, Canada

[3] Univ Sheffield, Sheffield, S Yorkshire, England

来源：

22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014) | 2014年

关键词：

Test effectiveness; real faults; mutation analysis; code coverage;

D O I：

10.1145/2635868.2635929

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

A good test suite is one that detects real faults. Because the set of faults in a program is usually unknowable, this definition is not useful to practitioners who are creating test suites, nor to researchers who are creating and evaluating tools that generate test suites. In place of real faults, testing research often uses mutants, which are artificial faults-each one a simple syntactic variation-that are systematically seeded throughout the program under test. Mutation analysis is appealing because large numbers of mutants can be automatically-generated and used to compensate for low quantities or the absence of known real faults. Unfortunately, there is little experimental evidence to support the use of mutants as a replacement for real faults. This paper investigates whether mutants are indeed a valid substitute for real faults, i.e., whether a test suite's ability to detect mutants is correlated with its ability to detect real faults that developers have fixed. Unlike prior studies, these investigations also explicitly consider the conflating effects of code coverage on the mutant detection rate. Our experiments used 357 real faults in 5 open-source applications that comprise a total of 321,000 lines of code. Furthermore, our experiments used both developer-written and automatically-generated test suites. The results show a statistically significant correlation between mutant detection and real fault detection, independently of code coverage. The results also give concrete suggestions on how to improve mutation analysis and reveal some inherent limitations.

引用

页码：654 / 665

页数：12

共 41 条

[1] Is mutation an appropriate tool for testing experiments? [J].

Andrews, JH ;

Briand, LC ;

Labiche, Y .

ICSE 05: 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2005, :402-411

[2]

[Anonymous], 2014, P INT C AUT SOFTW EN

[3]

[Anonymous], 2010, P FAST SOFTW ENCR WO

[4]

Baudry B., 2006, P INT C SOFTW ENG IC

[5]

Carzaniga A, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P782, DOI 10.1109/ICSE.2013.6606624

[6] JCrasher: an automatic robustness tester for Java']Java [J].

Csallner, C ;

Smaragdakis, Y .

SOFTWARE-PRACTICE & EXPERIENCE, 2004, 34 (11) :1025-1050

[7]

Csallner C., 2006, P ISSTA 2006, P245, DOI DOI 10.1145/1146238.1146267

[8]

Daran M., 1996, P INT S SOFTW TEST A

[9] HINTS ON TEST DATA SELECTION - HELP FOR PRACTICING PROGRAMMER [J].

DEMILLO, RA ;

LIPTON, RJ .

COMPUTER, 1978, 11 (04) :34-41

[10] Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact [J].

Do, HS ;

Elbaum, S ;

Rothermel, G .

EMPIRICAL SOFTWARE ENGINEERING, 2005, 10 (04) :405-435

← 1 2 3 4 5 →