Machine learning approaches for automated software traceability: A systematic literature review

被引:0
作者
Alturayeif, Nouf [1 ,2 ]
Hassine, Jameleddine [1 ,3 ]
Ahmad, Irfan [1 ,4 ]
机构
[1] KFUPM, Informat & Comp Sci Dept, Dhahran 31261, Saudi Arabia
[2] Imam Abdulrahman Bin Faisal Univ, Comp Dept, Dammam 31441, Saudi Arabia
[3] Interdisciplinary Res Ctr Intelligent Secure Syst, Dhahran 31261, Saudi Arabia
[4] KFUPM, SDAIA KFUPM Joint Res Ctr Artificial Intelligence, Dhahran 31261, Saudi Arabia
关键词
Software traceability; Machine learning; Deep learning; Transfer learning; Systematic literature review; LINK RECOVERY; CODE;
D O I
10.1016/j.jss.2025.112536
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software traceability is the process of tracking and managing relationships between software artifacts throughout the Software Development Life-Cycle (SDLC). It ensures that all software artifacts are correctly linked, facilitating change management, impact analysis, and regulatory compliance. Automated traceability can be achieved using Information Retrieval (IR) and Machine Learning (ML) approaches. This systematic literature review summarizes and synthesizes ML-based automated traceability studies. Considering the rapid ML advancements, analyzing current research is crucial for progress in the field. We identified 59 studies published between 2014 and June 2024. We found an increase in the publications, particularly in 2023 and continuing into 2024, with sustained citation impact. Around 170 datasets from different domains are used, covering natural and programming languages artifacts. Common artifacts include use cases and source code, focusing on Requirements Analysis and Implementation phases. Existing solutions mostly use classification and supervised learning, with the emerging use of deep learning and Large Language Models (LLMs), showing superior performance. We identified challenges and gaps with future trends to guide researchers. Challenges include imbalanced datasets, data scarcity, and limited real-world data, while gaps include handling missing true links, lack of benchmark datasets, and limited exploration of LLMs. Lastly, we provide recommendations for researchers based on the findings.
引用
收藏
页数:38
相关论文
共 145 条
[11]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[12]  
Bojanowski P., 2017, T ASSOC COMPUT LING, V5, P135, DOI [DOI 10.1162/TACL_A_00051, DOI 10.1162/TACLA00051, 10.1162/tacla00051]
[13]   Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability [J].
Borg, Markus ;
Runeson, Per ;
Ardo, Anders .
EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (06) :1565-1616
[14]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[15]   Toward accurate link between code and software documentation [J].
Cao, Yingkui ;
Zou, Yanzhen ;
Luo, Yuxiang ;
Xie, Bing ;
Zhao, Junfeng .
SCIENCE CHINA-INFORMATION SCIENCES, 2018, 61 (05)
[16]   Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation [J].
Chang, Zhiyuan ;
Li, Mingyang ;
Wang, Qing ;
Li, Shoubin ;
Wang, Junjie .
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, :1596-1608
[17]   Empirical studies on software traceability: A mapping study [J].
Charalampidou, Sofia ;
Ampatzoglou, Apostolos ;
Karountzos, Evangelos ;
Avgeriou, Paris .
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2021, 33 (02)
[18]   A Self-enhanced Automatic Traceability Link Recovery via Structure Knowledge Mining for Small-scale Labeled Data [J].
Chen, Lei ;
Wang, Dandan ;
Shi, Lin ;
Wang, Qing .
2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, :904-913
[19]   Enhancing Unsupervised Requirements Traceability with Sequential Semantics [J].
Chen, Lei ;
Wang, Dandan ;
Wang, Junjie ;
Wang, Qing .
2019 26TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), 2019, :23-30
[20]  
Chen T., 2020, PMLR, P1597