Applying a smoothing filter to improve IR-based traceability recovery processes: An empirical investigation

被引:18
作者
De Lucia, Andrea [1 ]
Di Penta, Massimiliano [2 ]
Oliveto, Rocco [3 ]
Panichella, Annibale [1 ]
Panichella, Sebastiano [2 ]
机构
[1] Univ Salerno, Fisciano, SA, Australia
[2] Univ Sannio, I-82100 Benevento, Italy
[3] Univ Molise, I-86090 Pesche, IS, Italy
关键词
Software traceability; Information retrieval; Smoothing filters; Empirical software engineering; CODE; COHESION; LINKS;
D O I
10.1016/j.infsof.2012.08.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Traceability relations among software artifacts often tend to be missing, outdated, or lost. For this reason, various traceability recovery approaches based on Information Retrieval (IR) techniques-have been proposed. The performances of such approaches are often influenced by "noise" contained in software artifacts (e.g., recurring words in document templates or other words that do not contribute to the retrieval itself). Aim: As a complement and alternative to stop word removal approaches, this paper proposes the use of a smoothing filter to remove "noise" from the textual corpus of artifacts to be traced. Method: We evaluate the effect of a smoothing filter in traceability recovery tasks involving different kinds of artifacts from five software projects, and applying three different IR methods, namely Vector Space Models, Latent Semantic Indexing, and Jensen-Shannon similarity model. Results: Our study indicates that, with the exception of some specific kinds of artifacts (i.e., tracing test cases to source code) the proposed approach is able to significantly improve the performances of traceability recovery, and to remove "noise" that simple stop word filters cannot remove. Conclusions: The obtained results not only help to develop traceability recovery approaches able to work in presence of noisy artifacts, but also suggest that smoothing filters can be used to improve performances of other software engineering approaches based on textual analysis. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:741 / 754
页数:14
相关论文
共 47 条
[1]   A traceability technique for specifications [J].
Abadi, Aharcin ;
Nisenson, Mordechai ;
Simionovici, Yahalomit .
PROCEEDINGS OF THE 16TH IEEE INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2008, :103-112
[2]  
[Anonymous], 1998, 8301998 IEEE
[3]  
[Anonymous], 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]
[4]  
[Anonymous], 2006, Digital Image Processing
[5]  
[Anonymous], 2000, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
[6]   Recovering traceability links between code and documentation [J].
Antoniol, G ;
Canfora, G ;
Casazza, G ;
De Lucia, A ;
Merlo, E .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (10) :970-983
[7]  
Asuncion H.U., 2010, P 32 INT C SOFTW ENG, P95
[8]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[9]   Identifying Extract Class refactoring opportunities using structural and semantic cohesion measures [J].
Bavota, Gabriele ;
De Lucia, Andrea ;
Oliveto, Rocco .
JOURNAL OF SYSTEMS AND SOFTWARE, 2011, 84 (03) :397-414
[10]  
Canfora G, 2005, 2005 11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS (METRICS), P259