Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning

被引：2

作者：

Guan, Hui ^{[1
,2
]}

Xu, Hang ^{[1
]}

Cai, Lie ^{[1
]}

机构：

[1] Shenyang Univ Chem Technol, Dept Comp Sci & Technol, Shenyang 110142, Peoples R China

[2] Shenyang Univ Chem Technol, Key Lab Ind Intelligence Technol Chem Proc, Shenyang 110142, Peoples R China

来源：

MATHEMATICS | 2024年 / 12卷 / 09期

关键词：

requirement dependency; machine learning; part-of-speech features; particle swarm optimization; ensemble learning; low correlation algorithm; grid search algorithm;

D O I：

10.3390/math12091272

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively.

引用

页数：37

共 54 条

[1] Abdulmajeed Ashraf Abdulmunim, 2021, Journal of Physics: Conference Series, V1897, DOI 10.1088/1742-6596/1897/1/012029
[2] CrowdAssist: A multidimensional decision support system for crowd workers
Abhinav, Kumar
Kaur Bhatia, Gurpriya
Dubey, Alpana
Jain, Sakshi
Bhardwaj, Nitish
[J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023, 35 (06)
[3] PyTraceBugs: A Large Python']Python Code Dataset for Supervised Machine Learning in Software Defect Prediction
Akimova, Elena N.
Bersenev, Alexander Yu
Deikov, Artem A.
Kobylkin, Konstantin S.
Konygin, Anton, V
Mezentsev, Ilya P.
Misilov, Vladimir E.
[J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021), 2021, : 141 - 151
[4] Ali A. M., 2022, Journal of Education and Science, V31, P66
[5] A Survey of Machine Learning for Big Code and Naturalness
Allamanis, Miltiadis
Barr, Earl T.
Devanbu, Premkumar
Sutton, Charles
[J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
[6] Automated Identification of Type-Specific Dependencies Between Requirements
Atas, Muesluem
Samer, Ralph
Felfernig, Alexander
[J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 688 - 695
[7] Berhanu Fekerte, 2023, 2023 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), P49, DOI 10.1109/ICT4DA59526.2023.10302263
[8] Bhatta J., 2020, Journal of Innovations in Engineering Education, V3, P71, DOI [DOI 10.3126/JIEE.V3I1.34327, 10.3126/jiee.v3i1.34327]
[9] Boehm B.W., 1981, Software Engineering Economics, P768
[10] Borrull Baraut R., 2018, Masters Thesis

← 1 2 3 4 5 6 →