BTLink : automatic link recovery between issues and commits based on pre-trained BERT model

被引:9
作者
Lan, Jinpeng [1 ]
Gong, Lina [1 ,2 ,3 ]
Zhang, Jingxuan [1 ]
Zhang, Haoxiang [4 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Key Lab Safety Crit Software, Nanjing, Peoples R China
[4] Queens Univ, Software Anal & Intelligence Lab SAIL, Kingston, ON, Canada
基金
中国国家自然科学基金;
关键词
Issue-commit links recovery; Issue report; Commit; pre-trained BERT model; Mining software repositories; ANALYSIS ALGORITHMS; FEATURE LOCATION; PROGRAM;
D O I
10.1007/s10664-023-10342-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data traceability in software development can connect different software artifacts to enhance the observability of developer practices. In particular, traceability links between issues and commits (i.e., issue-commit links) play a key role in software maintenance tasks (e.g., bug localization and bug prediction). In practice, developers typically manually make the issue-commit links by adding the issue identifier into the message of the corresponding commits, which results in missing issue commit links being prevalent in software projects. To recover the missing issue commit links, previous studies have proposed some automatic approaches. However, due to the difference between heuristic rules and real-world behavior, as well as insufficient semantic understanding, these approaches cannot achieve the expected performance. Since the text contained in issues and commits contains highly related information, thorough text understanding can improve traceability links. Meanwhile, pre-trained models (i.e., PTMs) have been successfully used to explore the semantic information of text in various software engineering tasks (e.g., software code generation). Therefore, our study proposes a novel BERT -based method (i.e., BTLink) that employs the pre-trained models to automatically recover the issue-commits links. Our proposed BTlink method includes a BERT embedding layer, a fusion layer, and a classifier layer. First, we build two pre-trained BERT encoders to respectively explore the feature representation of the issue text in combination with commit code and commit text. Then we build the fusion layer to examine the joint feature vector. Finally, we build the classifier layer to identify the links between issue and commit. In addition, to further our investigation and verify the effectiveness of BTLink, we conduct an extensive case study on 12 issue-commit links datasets from open source software projects, and observe that: (i) compared to state-of-the-art approaches, our proposed BTLink improves the performance of automatic issue-commit links recovery on all studied measures; (ii) both text and code information in the issues and commits are effective to recover more accurate issue-commit links; (iii) our proposed BTLink is more applicable to the cross-project context compared to state-of-the-art approaches.
引用
收藏
页数:55
相关论文
共 112 条
[1]   COMPARATIVE ANALYSES OF BERT, ROBERTA, DISTILBERT, AND XLNET FOR TEXT-BASED EMOTION RECOGNITION [J].
Adoma, Acheampong Francisca ;
Henry, Nunoo-Mensah ;
Chen, Wenyu .
2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, :117-121
[2]  
Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
[3]  
Ahmed T, 2022, Arxiv, DOI arXiv:2104.14671
[4]   Software Engineering for Machine Learning: A Case Study [J].
Amershi, Saleema ;
Begel, Andrew ;
Bird, Christian ;
DeLine, Robert ;
Gall, Harald ;
Kamar, Ece ;
Nagappan, Nachiappan ;
Nushi, Besmira ;
Zimmermann, Thomas .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, :291-300
[5]  
[Anonymous], 1960, IFAC P
[6]  
Anvik John, 2006, 28 INT C SOFTW ENG, P361, DOI DOI 10.1145/1134285.1134336
[7]   Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations [J].
Applis, Leonhard ;
Panichella, Annibale ;
van Deursen, Arie .
2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, :1377-1381
[8]  
Bachmann A., 2010, Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering-FSE'10, P97
[9]   Developer Testing in the IDE: Patterns, Beliefs, and Behavior [J].
Beller, Moritz ;
Gousios, Georgios ;
Panichella, Annibale ;
Proksch, Sebastian ;
Amann, Sven ;
Zaidman, Andy .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (03) :261-284
[10]  
Berabi B, 2021, PR MACH LEARN RES, V139