Large-scale information retrieval in software engineering - an experience report from industrial application

被引:0
作者
Michael Unterkalmsteiner
Tony Gorschek
Robert Feldt
Niklas Lavesson
机构
[1] Blekinge Institute of Technology,Department of Software Engineering
[2] Blekinge Institute of Technology,Department of Computer Science and Engineering
来源
Empirical Software Engineering | 2016年 / 21卷
关键词
Test case selection; Information retrieval; Data mining; Experiment;
D O I
暂无
中图分类号
学科分类号
摘要
Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.
引用
收藏
页码:2324 / 2365
页数:41
相关论文
共 138 条
[21]  
Jordan MI(2003)Embedded software engineering: the state of the practice IEEE Softw 20 61-69
[22]  
Brand M(2013)Using heuristics to estimate an appropriate number of latent topics in source code analysis Sci Comput Program 78 1663-1678
[23]  
Chen L(2011)TIDIER: an identifier splitting approach using speech recognition techniques J Softw Maint Evol Res Pract 25 575-599
[24]  
Babar MA(2005)SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems ACM Trans Math Softw 31 351-362
[25]  
Cleary B(1995)Testing heuristics: We have it all wrong J Heuristics 1 33-42
[26]  
Exton C(2011)A method for evaluating rigor and industrial relevance of technology evaluations Empir Softw Eng 16 365-395
[27]  
Buckley J(1980)Notation As a Tool of Thought Commun ACM 23 444-465
[28]  
English M(1998)Why Permutation Tests are Superior to t and F Tests in Biomedical Research Am Stat 52 127-132
[29]  
De Lucia A(2008)Comparing cost prediction models by resampling techniques J Syst Softw 81 616-632
[30]  
Oliveto R(2008)A practitioner’s guide to light weight software process assessment and improvement planning The Journal of Systems and Software 81 972-995