The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization

被引：23

作者：

Tantithamthavorn, Chakkrit

Abebe, Surafel Lemma

Hassan, Ahmed E.

Ihara, Akinori

Matsumoto, Kenichi

机构：

[1] The University of Adelaide, Australia

[2] The Addis Ababa University, Ethiopia

[3] Queen's University, Canada

[4] Nara Institute of Science and Technology, Japan

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2018年 / 102卷

关键词：

Bug localization; Classifier configuration; Evaluation metrics; Top-k performance; Effort; PROBABILISTIC RANKING; SOURCE CODE; RETRIEVAL;

D O I：

10.1016/j.infsof.2018.06.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: IR-based bug localization is a classifier that assists developers in locating buggy source code entities (e.g., files and methods) based on the content of a bug report. Such IR-based classifiers have various parameters that can be configured differently (e.g., the choice of entity representation). Objective: In this paper, we investigate the impact of the choice of the IR-based classifier configuration on the top-k performance and the required effort to examine source code entities before locating a bug at the method level. Method: We execute a large space of classifier configuration, 3172 in total, on 5266 bug reports of two software systems, i.e., Eclipse and Mozilla. Results: We find that (1) the choice of classifier configuration impacts the top-k performance from 0.44% to 36% and the required effort from 4395 to 50,000 LOC; (2) classifier configurations with similar top-k performance might require different efforts; (3) VSM achieves both the best top-k performance and the least required effort for method-level bug localization; (4) the likelihood of randomly picking a configuration that performs within 20% of the best top-k classifier configuration is on average 5.4% and that of the least effort is on average 1%; (5) configurations related to the entity representation of the analyzed data have the most impact on both the top-k performance and the required effort; and (6) the most efficient classifier configuration obtained at the method level can also be used at the file-level (and vice versa). Conclusion: Our results lead us to conclude that configuration has a large impact on both the top-k performance and the required effort for method-level bug localization, suggesting that the IR-based configuration settings should be carefully selected and the required effort metric should be included in future bug localization studies.

引用

页码：160 / 174

页数：15

共 63 条

[1]

Anh Tuan Nguyen, 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering, P263, DOI 10.1109/ASE.2011.6100062

[2]

[Anonymous], 2014, REQUIR ENG

[3]

[Anonymous], P INT C SOFTW ENG SO

[4]

[Anonymous], 2018, ARXIV180110271

[5] A systematic and comprehensive investigation of methods to build and evaluate fault prediction models [J].

Arisholm, Erik ;

Briand, Lionel C. ;

Johannessen, Eivind B. .

JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (01) :2-17

[6] Configuring latent Dirichlet allocation based feature location [J].

Biggers, Lauren R. ;

Bocovich, Cecylia ;

Capshaw, Riley ;

Eddy, Brian P. ;

Etzkorn, Letha H. ;

Kraft, Nicholas A. .

EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (03) :465-500

[7] Fair and Balanced? Bias in Bug-Fix Datasets [J].

Bird, Christian ;

Bachmann, Adrian ;

Aune, Eirik ;

Duffy, John ;

Bernstein, Abraham ;

Filkov, Vladimir ;

Devanbu, Premkumar .

7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, :121-130

[8] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

[9] A survey on the use of topic models when mining software repositories [J].

Chen, Tse-Hsun ;

Thomas, Stephen W. ;

Hassan, Ahmed E. .

EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (05) :1843-1919

[10]

Cleland-Huang Jane, 2014, FOSE, P55, DOI DOI 10.1145/2593882.2593891

← 1 2 3 4 5 6 7 →