BTLink : automatic link recovery between issues and commits based on pre-trained BERT model

被引:9
作者
Lan, Jinpeng [1 ]
Gong, Lina [1 ,2 ,3 ]
Zhang, Jingxuan [1 ]
Zhang, Haoxiang [4 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Key Lab Safety Crit Software, Nanjing, Peoples R China
[4] Queens Univ, Software Anal & Intelligence Lab SAIL, Kingston, ON, Canada
基金
中国国家自然科学基金;
关键词
Issue-commit links recovery; Issue report; Commit; pre-trained BERT model; Mining software repositories; ANALYSIS ALGORITHMS; FEATURE LOCATION; PROGRAM;
D O I
10.1007/s10664-023-10342-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data traceability in software development can connect different software artifacts to enhance the observability of developer practices. In particular, traceability links between issues and commits (i.e., issue-commit links) play a key role in software maintenance tasks (e.g., bug localization and bug prediction). In practice, developers typically manually make the issue-commit links by adding the issue identifier into the message of the corresponding commits, which results in missing issue commit links being prevalent in software projects. To recover the missing issue commit links, previous studies have proposed some automatic approaches. However, due to the difference between heuristic rules and real-world behavior, as well as insufficient semantic understanding, these approaches cannot achieve the expected performance. Since the text contained in issues and commits contains highly related information, thorough text understanding can improve traceability links. Meanwhile, pre-trained models (i.e., PTMs) have been successfully used to explore the semantic information of text in various software engineering tasks (e.g., software code generation). Therefore, our study proposes a novel BERT -based method (i.e., BTLink) that employs the pre-trained models to automatically recover the issue-commits links. Our proposed BTlink method includes a BERT embedding layer, a fusion layer, and a classifier layer. First, we build two pre-trained BERT encoders to respectively explore the feature representation of the issue text in combination with commit code and commit text. Then we build the fusion layer to examine the joint feature vector. Finally, we build the classifier layer to identify the links between issue and commit. In addition, to further our investigation and verify the effectiveness of BTLink, we conduct an extensive case study on 12 issue-commit links datasets from open source software projects, and observe that: (i) compared to state-of-the-art approaches, our proposed BTLink improves the performance of automatic issue-commit links recovery on all studied measures; (ii) both text and code information in the issues and commits are effective to recover more accurate issue-commit links; (iii) our proposed BTLink is more applicable to the cross-project context compared to state-of-the-art approaches.
引用
收藏
页数:55
相关论文
共 112 条
[51]   Collaboration Tools for Global Software Engineering [J].
Lanubile, Filippo ;
Ebert, Christof ;
Prikladnicki, Rafael ;
Vizcaino, Aurora .
IEEE SOFTWARE, 2010, 27 (02) :52-55
[52]   RCLinker: Automated Linking of Issue Reports and Commits Leveraging Rich Contextual Information [J].
Le, Tien-Duy B. ;
Linares-Vasquez, Mario ;
Lo, David ;
Poshyvanyk, Denys .
2015 IEEE 23RD INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION ICPC 2015, 2015, :36-47
[53]   Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models [J].
Lin, Jinfeng ;
Liu, Yalin ;
Zeng, Qingkai ;
Jiang, Meng ;
Cleland-Huang, Jane .
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, :324-335
[54]   ChangeScribe: A Tool for Automatically Generating Commit Messages [J].
Linares-Vasquez, Mario ;
Cortes-Coy, Luis Fernando ;
Aponte, Jairo ;
Poshyvanyk, Denys .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, :709-712
[55]  
Liu YH, 2019, Arxiv, DOI arXiv:1907.11692
[56]  
Loeliger Jon, 2012, Version Control with Git: Powerful tools and techniques for collaborative software development
[57]  
Loper E., 2002, arXiv, P63
[58]  
Lu S, 2021, Arxiv, DOI arXiv:2102.04664
[59]   TwinBERT: Distilling Knowledge to Twin-Structured Compressed BERT Models for Large-Scale Retrieval [J].
Lu, Wenhao ;
Jiao, Jian ;
Zhang, Ruofei .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :2645-2652
[60]  
Maalej Walid, 2010, Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), P191, DOI 10.1109/MSR.2010.5463344