Estimating the number of remaining links in traceability recovery

被引:26
作者
Falessi, Davide [1 ]
Di Penta, Massimiliano [2 ]
Canfora, Gerardo [2 ]
Cantone, Giovanni [3 ]
机构
[1] Calif Polytech State Univ San Luis Obispo, Dept Comp Sci, San Luis Obispo, CA 93407 USA
[2] Univ Sannio, Dept Engn, Benevento, BN, Italy
[3] Univ Roma Tor Vergata, Dept Civil Engn & Comp Sci, DICII, Rome, Italy
关键词
Information retrieval; Traceability link recovery; Metrics and measurement; CAPTURE-RECAPTURE; REQUIREMENTS; CODE;
D O I
10.1007/s10664-016-9460-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones.
引用
收藏
页码:996 / 1027
页数:32
相关论文
共 70 条
[11]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[12]   Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability [J].
Borg, Markus ;
Runeson, Per ;
Ardo, Anders .
EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (06) :1565-1616
[13]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[14]   A comprehensive evaluation of capture-recapture models for estimating software defect content [J].
Briand, LC ;
El Emam, K ;
Freimut, BG ;
Laitenberger, O .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2000, 26 (06) :518-540
[15]   Traceability and SysML Design Slices to Support Safety Inspections: A Controlled Experiment [J].
Briand, Lionel ;
Falessi, Davide ;
Nejati, Shiva ;
Sabetzadeh, Mehrdad ;
Yue, Tao .
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2014, 23 (01)
[16]   On estimating the number of defects remaining in software [J].
Cai, KY .
JOURNAL OF SYSTEMS AND SOFTWARE, 1998, 40 (02) :93-114
[17]   On the Role of the Nouns in IR-based Traceability Recovery [J].
Capobianco, Giovanni ;
De Lucia, Andrea ;
Oliveto, Rocco ;
Panichella, Annibale ;
Panichella, Sebastiano .
ICPC: 2009 IEEE 17TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2009, :148-+
[18]  
Chen T., 1999, Proceedings 4th IEEE International Symposium on High-Assurance Systems Engineering, P249, DOI 10.1109/HASE.1999.809500
[19]   Utilizing supporting evidence to improve dynamic requirements traceability [J].
Cleland-Huang, J ;
Settimi, R ;
Duan, C ;
Zou, XC .
13TH IEEE INTERNATIONAL CONFERENCE ON REQUIREMENTS ENGINEERING, PROCEEDINGS, 2005, :135-144
[20]  
Colwell D. J., 1982, Math. Gaz, V66, P307, DOI [DOI 10.2307/3615525, 10.2307/3615525]