Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning

被引:2
作者
Guan, Hui [1 ,2 ]
Xu, Hang [1 ]
Cai, Lie [1 ]
机构
[1] Shenyang Univ Chem Technol, Dept Comp Sci & Technol, Shenyang 110142, Peoples R China
[2] Shenyang Univ Chem Technol, Key Lab Ind Intelligence Technol Chem Proc, Shenyang 110142, Peoples R China
关键词
requirement dependency; machine learning; part-of-speech features; particle swarm optimization; ensemble learning; low correlation algorithm; grid search algorithm;
D O I
10.3390/math12091272
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively.
引用
收藏
页数:37
相关论文
共 54 条
  • [1] Abdulmajeed Ashraf Abdulmunim, 2021, Journal of Physics: Conference Series, V1897, DOI 10.1088/1742-6596/1897/1/012029
  • [2] CrowdAssist: A multidimensional decision support system for crowd workers
    Abhinav, Kumar
    Kaur Bhatia, Gurpriya
    Dubey, Alpana
    Jain, Sakshi
    Bhardwaj, Nitish
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023, 35 (06)
  • [3] PyTraceBugs: A Large Python']Python Code Dataset for Supervised Machine Learning in Software Defect Prediction
    Akimova, Elena N.
    Bersenev, Alexander Yu
    Deikov, Artem A.
    Kobylkin, Konstantin S.
    Konygin, Anton, V
    Mezentsev, Ilya P.
    Misilov, Vladimir E.
    [J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021), 2021, : 141 - 151
  • [4] Ali A. M., 2022, Journal of Education and Science, V31, P66
  • [5] A Survey of Machine Learning for Big Code and Naturalness
    Allamanis, Miltiadis
    Barr, Earl T.
    Devanbu, Premkumar
    Sutton, Charles
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [6] Automated Identification of Type-Specific Dependencies Between Requirements
    Atas, Muesluem
    Samer, Ralph
    Felfernig, Alexander
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 688 - 695
  • [7] Berhanu Fekerte, 2023, 2023 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), P49, DOI 10.1109/ICT4DA59526.2023.10302263
  • [8] Bhatta J., 2020, Journal of Innovations in Engineering Education, V3, P71, DOI [DOI 10.3126/JIEE.V3I1.34327, 10.3126/jiee.v3i1.34327]
  • [9] Boehm B.W., 1981, Software Engineering Economics, P768
  • [10] Borrull Baraut R., 2018, Masters Thesis