Bug characteristics in open source software

被引:175
作者
Tan, Lin [1 ]
Liu, Chen [1 ]
Li, Zhenmin [2 ]
Wang, Xuanhui [3 ]
Zhou, Yuanyuan [4 ,5 ]
Zhai, Chengxiang [6 ]
机构
[1] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
[2] VMware Inc, Palo Alto, CA 94304 USA
[3] Facebook Inc, Menlo Pk, CA 94025 USA
[4] Univ Calif San Diego, La Jolla, CA 92093 USA
[5] Pattern Insight Inc, La Jolla, CA 92093 USA
[6] Univ Illinois, Urbana, IL 61801 USA
基金
美国能源部; 美国国家科学基金会;
关键词
Software bug characteristics; Empirical study; Software reliability; Open source; Bug detection; SUPPORT; ERRORS; FAULTS;
D O I
10.1007/s10664-013-9258-8
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects-the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions-root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.
引用
收藏
页码:1665 / 1705
页数:41
相关论文
共 89 条
[1]  
Adve, 2010, P 32 ACM IEEE INT C, V1, P485, DOI DOI 10.1145/1806799.1806870
[2]  
[Anonymous], 2013, SUPPORT VECTOR MACHI
[3]  
[Anonymous], 1990, Software Testing Techniques
[4]  
[Anonymous], 2002, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
[5]  
Anvik J., 2006, P 28 INT C SOFTW ENG, P361, DOI [DOI 10.1145/1134285.1134336, 10.1145/1134285.1134336]
[6]   The Secret Life of Bugs: Going Past the Errors and Omissions in Software Repositories [J].
Aranda, Jorge ;
Venolia, Gina .
2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :298-+
[7]   Basic concepts and taxonomy of dependable and secure computing [J].
Avizienis, A ;
Laprie, JC ;
Randell, B ;
Landwehr, C .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2004, 1 (01) :11-33
[8]   SOFTWARE ERRORS AND COMPLEXITY - AN EMPIRICAL-INVESTIGATION [J].
BASILI, VR ;
PERRICONE, BT .
COMMUNICATIONS OF THE ACM, 1984, 27 (01) :42-52
[9]   Fair and Balanced? Bias in Bug-Fix Datasets [J].
Bird, Christian ;
Bachmann, Adrian ;
Aune, Eirik ;
Duffy, John ;
Bernstein, Abraham ;
Filkov, Vladimir ;
Devanbu, Premkumar .
7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, :121-130
[10]  
Bougie Gargi, 2010, Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), P106, DOI 10.1109/MSR.2010.5463291