Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery

被引:8
作者
Gao, Hui [1 ]
Kuang, Hongyu [1 ]
Sun, Kexin [1 ]
Ma, Xiaoxing [1 ]
Egyed, Alexander [2 ]
Maeder, Patrick [3 ]
Rong, Guoping [1 ]
Shao, Dong [1 ]
Zhang, He [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Johannes Kepler Univ Linz, Inst Software Syst Engn, Linz, Austria
[3] Tech Univ Ilmenau, Fak Informat & Automatisierung, Ilmenau, Germany
来源
PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022 | 2022年
基金
奥地利科学基金会; 中国国家自然科学基金;
关键词
traceability recovery; text structures; biterm; information retrieval; INFORMATION-RETRIEVAL; FEATURE LOCATION; LINKS; EXECUTION;
D O I
10.1145/3551349.3556948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traceability approves trace links among software artifacts based on whether two artifacts are related by system functionalities. The traces are valuable for software development, but are difficult to obtain manually. To cope with the costly and fallible manual recovery, automated approaches are proposed to recover traces through textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, the low quality & quantity of artifact texts negatively impact the calculated IR values, thus greatly hindering the performance of IR-based approaches. In this study, we propose to extract co-occurred word pairs from the text structures of both requirements and code (i.e., consensual biterms) to improve IR-based traceability recovery. We first collect a set of biterms based on the part-of-speech of requirement texts, and then filter them through the code texts. We then use these consensual biterms to both enrich the input corpus for IR techniques and enhance the calculations of IR values. A nine-system-based evaluation shows that in general, when solely used to enhance IR techniques, our approach can outperform pure IR-based approaches and another baseline by 21.9% & 21.8% in AP, and 9.3% & 7.2% in MAP, respectively. Moreover, when used to collaborate with another enhancing strategy from different perspectives, it can outperform this baseline by 5.9% in AP and 4.8% in MAP.
引用
收藏
页数:13
相关论文
共 63 条
[1]   A traceability technique for specifications [J].
Abadi, Aharcin ;
Nisenson, Mordechai ;
Simionovici, Yahalomit .
PROCEEDINGS OF THE 16TH IEEE INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2008, :103-112
[2]   Exploiting Parts-of-Speech for effective automated requirements traceability [J].
Ali, Nasir ;
Cai, Haipeng ;
Hamou-Lhadj, Abdelwahab ;
Hassine, Jameleddine .
INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 106 :126-141
[3]   An empirical study on the importance of source code entities for requirements traceability [J].
Ali, Nasir ;
Sharafi, Zohreh ;
Gueheneuc, Yann-Gael ;
Antoniol, Giuliano .
EMPIRICAL SOFTWARE ENGINEERING, 2015, 20 (02) :442-478
[4]  
Ali N, 2012, 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), P191, DOI 10.1109/ICSM.2012.6405271
[5]   Recovering traceability links between code and documentation [J].
Antoniol, G ;
Canfora, G ;
Casazza, G ;
De Lucia, A ;
Merlo, E .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (10) :970-983
[6]  
Baeza-Yates Ricardo, 2011, Modern Information Retrieval: The concepts and technology behind search, V2nd
[7]  
BIGGERSTAFF TJ, 1993, PROC INT CONF SOFTW, P482, DOI 10.1109/ICSE.1993.346017
[8]  
Binkley Dave, 2011, P 8 WORK C MIN SOFTW, P203
[9]   Source code analysis: A road map [J].
Binkley, David .
FoSE 2007: Future of Software Engineering, 2007, :104-119
[10]   A Survey of the Forms of Java']Java Reference Names [J].
Butler, Simon ;
Wermelinger, Michel ;
Yu, Yijun .
2015 IEEE 23RD INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION ICPC 2015, 2015, :196-206