Exploiting Parts-of-Speech for effective automated requirements traceability

被引:23
作者
Ali, Nasir [1 ]
Cai, Haipeng [2 ]
Hamou-Lhadj, Abdelwahab [3 ]
Hassine, Jameleddine [4 ]
机构
[1] Univ Memphis, Dept Comp Sci, Memphis, TN 38152 USA
[2] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
[3] Concordia Univ, Elect & Comp Engn Dept, Montreal, PQ, Canada
[4] King Fand Univ Petr & Minerals, Dept Informat & Comp Sci, Dhahran, Saudi Arabia
关键词
Requirements traceability (RT); Parts of Speech (POS); Information retrieval (IR); Trace links; DESIGN-CODE TRACEABILITY; DOCUMENTATION; LINKS;
D O I
10.1016/j.infsof.2018.09.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (POS) tagging of software artifacts to recover trace links among them. These studies work on the premise that discarding one or more POS tags results in an improved accuracy of Information Retrieval (IR) techniques. Objective: First, we show empirically that excluding one or more POS tags could negatively impact the accuracy of existing IR-based traceability approaches, namely the Vector Space Model (VSM) and the Jensen Shannon Model (JSM). Second, we propose a method that improves the accuracy of IR-based traceability approaches. Method: We developed an approach, called ConPOS, to recover trace links using constraint-based pruning. ConPOS uses major POS categories and applies constraints to the recovered trace links for pruning as a filtering process to significantly improve the effectiveness of IR-based techniques. We conducted an experiment to provide evidence that removing POSs does not improve the accuracy of IR techniques. Furthermore, we conducted two empirical studies to evaluate the effectiveness of ConPOS in recovering trace links compared to existing peer RT approaches. Results: The results of the first empirical study show that removing one or more POS negatively impacts the accuracy of VSM and JSM. Furthermore, the results from the other empirical studies show that ConPOS provides 11%-107%, 8%-64%, and 15%-170% higher precision, recall, and mean average precision (MAP) than VSM and JSM. Conclusion: We showed that ConPos outperforms existing IR-based RT approaches that discard some POS tags from the input documents.
引用
收藏
页码:126 / 141
页数:16
相关论文
共 61 条
[1]   A traceability technique for specifications [J].
Abadi, Aharcin ;
Nisenson, Mordechai ;
Simionovici, Yahalomit .
PROCEEDINGS OF THE 16TH IEEE INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2008, :103-112
[2]  
Abebe Surafel Lemma, 2010, Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC 2010), P156, DOI 10.1109/ICPC.2010.29
[3]  
Ali N., 2011, 2011 18th Working Conference on Reverse Engineering, P45, DOI 10.1109/WCRE.2011.16
[4]  
Ali N., 2011, SOFTWARE SYSTEMS TRA
[5]  
Ali N., 2010, TECHNICAL REPORT
[6]   Trustrace: Mining Software Repositories to Improve the Accuracy of Requirement Traceability Links [J].
Ali, Nasir ;
Gueheneuc, Yann-Gael ;
Antoniol, Giuliano .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (05) :725-741
[7]  
Ali N, 2012, 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), P191, DOI 10.1109/ICSM.2012.6405271
[8]  
[Anonymous], 2010, 2010 ACM IEEE 32 INT
[9]  
[Anonymous], 2000, Experimentation in softwareengineeringAn Introduction
[10]  
[Anonymous], 2015, REQUIR ENG, DOI DOI 10.1007/S00766-013-0199-Y