An empirical study on the importance of source code entities for requirements traceability

被引:27
作者
Ali, Nasir [1 ]
Sharafi, Zohreh [2 ]
Gueheneuc, Yann-Gael [2 ]
Antoniol, Giuliano [2 ]
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Kingston, ON, Canada
[2] Ecole Polytech, DGIGL, Montreal, PQ H3C 3A7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
LINKS;
D O I
10.1007/s10664-014-9315-y
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers' eye movements while they verify RT links. We analyse the obtained data to identify and rank developers' preferred types of Source Code Entities (SCEs), e.g., domain vs. implementation-level source code terms and class names vs. method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers' preferred types of SCEs and not their locations that attract developers' attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency (D P T F / I D F), that uses the knowledge of the developers' preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate thisweighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency (T F / I D F) weighting scheme. Finally, we compare the newly proposed D P T F / I D F with our original Domain Or Implementation/Inverse Document Frequency (D O I / I D F) weighting scheme.
引用
收藏
页码:442 / 478
页数:37
相关论文
共 45 条
[1]   A traceability technique for specifications [J].
Abadi, Aharcin ;
Nisenson, Mordechai ;
Simionovici, Yahalomit .
PROCEEDINGS OF THE 16TH IEEE INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2008, :103-112
[2]  
Abebe S. L., 2011, 2011 18th Working Conference on Reverse Engineering, P77, DOI 10.1109/WCRE.2011.19
[3]  
Ali N., 2011, 2011 18th Working Conference on Reverse Engineering, P45, DOI 10.1109/WCRE.2011.16
[4]  
Ali N, 2011, SOFTWARE SYSTEMS TRA
[5]  
Ali N, 2011, P 19 IEEE INT C PROG, P10
[6]  
Ali N, 2012, IEEE T SOFTWARE ENG, V99, P1
[7]  
Ali N, 2012, 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), P191, DOI 10.1109/ICSM.2012.6405271
[8]  
[Anonymous], 1977, Treatise on basic philosophy volume 3: Ontology I-The furniture of the world
[9]   Recovering traceability links between code and documentation [J].
Antoniol, G ;
Canfora, G ;
Casazza, G ;
De Lucia, A ;
Merlo, E .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (10) :970-983
[10]   Design-code traceability for object-oriented systems [J].
Antoniol, G ;
Caprile, B ;
Potrich, A ;
Tonella, P .
ANNALS OF SOFTWARE ENGINEERING, 2000, 9 (1-4) :35-58