Large-scale information retrieval in software engineering - an experience report from industrial application

被引:0
作者
Michael Unterkalmsteiner
Tony Gorschek
Robert Feldt
Niklas Lavesson
机构
[1] Blekinge Institute of Technology,Department of Software Engineering
[2] Blekinge Institute of Technology,Department of Computer Science and Engineering
来源
Empirical Software Engineering | 2016年 / 21卷
关键词
Test case selection; Information retrieval; Data mining; Experiment;
D O I
暂无
中图分类号
学科分类号
摘要
Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.
引用
收藏
页码:2324 / 2365
页数:41
相关论文
共 138 条
[1]  
Antoniol G(2002)Recovering traceability links between code and documentation IEEE Trans Softw Eng 28 970-983
[2]  
Canfora G(2010)Managing Variability in Software Product Lines IEEE Software 27 89-91
[3]  
Casazza G(1995)Improve software quality by reusing knowledge and experience Sloan Manage Rev 37 55-64
[4]  
De Lucia A(2001)Adjusting for multiple testing - when and how? J Clin Epidemiol 54 343-349
[5]  
Merlo E(2014)Configuring latent Dirichlet allocation based feature location Empir Softw Eng 19 465-500
[6]  
Babar MA(2003)Latent Dirichlet Allocation J Mach Learn Res 3 993-1022
[7]  
Lianping C(2006)Fast low-rank modifications of the thin singular value decomposition Linear Algebra Appl 415 20-30
[8]  
Shull F(2011)A systematic review of evaluation of variability management approaches in software product lines Inf Softw Technol 53 344-362
[9]  
Basili V(2009)An empirical analysis of information retrieval based concept location techniques in software comprehension Empir Softw Eng 14 93-130
[10]  
Caldiera G(2009)Assessing IR-based traceability recovery tools through controlled experiments Empir Softw Eng 14 57-92