Machine learning approaches for automated software traceability: A systematic literature review

被引：0

作者：

Alturayeif, Nouf ^{[1
,2
]}

Hassine, Jameleddine ^{[1
,3
]}

Ahmad, Irfan ^{[1
,4
]}

机构：

[1] KFUPM, Informat & Comp Sci Dept, Dhahran 31261, Saudi Arabia

[2] Imam Abdulrahman Bin Faisal Univ, Comp Dept, Dammam 31441, Saudi Arabia

[3] Interdisciplinary Res Ctr Intelligent Secure Syst, Dhahran 31261, Saudi Arabia

[4] KFUPM, SDAIA KFUPM Joint Res Ctr Artificial Intelligence, Dhahran 31261, Saudi Arabia

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2025年 / 230卷

关键词：

Software traceability; Machine learning; Deep learning; Transfer learning; Systematic literature review; LINK RECOVERY; CODE;

D O I：

10.1016/j.jss.2025.112536

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Software traceability is the process of tracking and managing relationships between software artifacts throughout the Software Development Life-Cycle (SDLC). It ensures that all software artifacts are correctly linked, facilitating change management, impact analysis, and regulatory compliance. Automated traceability can be achieved using Information Retrieval (IR) and Machine Learning (ML) approaches. This systematic literature review summarizes and synthesizes ML-based automated traceability studies. Considering the rapid ML advancements, analyzing current research is crucial for progress in the field. We identified 59 studies published between 2014 and June 2024. We found an increase in the publications, particularly in 2023 and continuing into 2024, with sustained citation impact. Around 170 datasets from different domains are used, covering natural and programming languages artifacts. Common artifacts include use cases and source code, focusing on Requirements Analysis and Implementation phases. Existing solutions mostly use classification and supervised learning, with the emerging use of deep learning and Large Language Models (LLMs), showing superior performance. We identified challenges and gaps with future trends to guide researchers. Challenges include imbalanced datasets, data scarcity, and limited real-world data, while gaps include handling missing true links, lack of benchmark datasets, and limited exploration of LLMs. Lastly, we provide recommendations for researchers based on the findings.

引用

页数：38

共 145 条

[1]

Achiam J, 2024, GPT-4 technical report, DOI DOI 10.48550/ARXIV.2303.08774

[2]

Alturayeif N., 2024, Expert Syst. Appl.

[3]

[Anonymous], 2014, Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014

[4]

[Anonymous], 2024, Center of Excellence for Software Systems Traceability

[5]

Anthropic, 2024, Claude

[6] Recovering traceability links between code and documentation [J].

Antoniol, G ;

Canfora, G ;

Casazza, G ;

De Lucia, A ;

Merlo, E .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (10) :970-983

[7] A Literature Review of Automatic Traceability Links Recovery for Software Change Impact Analysis [J].

Aung, Thazin Win Win ;

Huo, Huan ;

Sui, Yulei .

2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, :14-24

[8] ATLaS: A Framework for Traceability Links Recovery Combining Information Retrieval and Semi-supervised Techniques [J].

Bella, Emma Effa ;

Creff, Stephen ;

Gervais, Marie-Pierre ;

Bendraou, Reda .

2019 IEEE 23RD INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC), 2019, :161-170

[9] Semi-supervised Approach for Recovering Traceability Links in Complex Systems [J].

Bella, Emma Effa ;

Gervais, Marie-Pierre ;

Bendraou, Reda ;

Wouters, Laurent ;

Koudri, Ali .

2018 23RD INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS (ICECCS), 2018, :193-196

[10]

Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615

← 1 2 3 4 5 6 7 8 9 10 →