A random forest model for early-stage software effort estimation for the SEERA dataset

被引:7
作者
Mustafa, Emtinan I. [1 ]
Osman, Rasha [1 ]
机构
[1] Univ Khartoum, Fac Math Sci & Informat, Khartoum, Sudan
关键词
Software effort and duration estimation; Random forest; Early-stage; The SEERA dataset; Technically constrained environments; SMOGN; COST-ESTIMATION; PROJECT EFFORT; PREDICTION; SYSTEMS;
D O I
10.1016/j.infsof.2024.107413
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Publicly available software cost estimation datasets are outdated and may not represent current industrial environments. Thus most research has concentrated on the development and evaluation of estimation models with limited evidence of their applicability to industrial practice. Moreover, these datasets and models may not be applicable in (under-represented) technically and economically constrained environments such as the software development environment in Sudan. Objective: This paper aims to develop a machine learning model that is suitable for the Sudanese software industry. To demonstrate the suitability of our approach, we evaluate our model using the publicly available SEERA (Software enginEERing in SudAn) dataset, which is a software cost estimation dataset from organizations in Sudan. Method: We demonstrated the suitability of the SEERA dataset for effort estimation by comparing the attributes that had a high correlation with actual effort and actual duration to the cost factors identified by (Sudanese) experts. In addition, we developed an early-stage Random Forest model to estimate project effort and duration from the SEERA dataset. Early-stage estimation is in-line with current Sudanese industrial practice. We investigated the impact of oversampling, feature selection, heterogeneity and local environmental factors on model accuracy. Results: Our experimental results showed that the Random Forest model with oversampling and feature selection provided accurate estimates that were better than random guessing (standardized accuracy > 70 %). Our results were similar to accuracies reported in the literature. In addition, we demonstrated that our random forest model provided estimations that were more accurate than (Sudanese) expert judgement. Conclusion: This study has demonstrated the feasibility of our random forest model for early-stage effort and duration estimation for Sudanese software projects. The results demonstrate the importance of representative models and datasets for non-traditional technical environments. Further research is required to investigate the impact of local environmental factors on software cost estimation.
引用
收藏
页数:18
相关论文
共 69 条
[1]   Investigating the use of random forest in software effort estimation [J].
Abdelali, Zakrani ;
Mustapha, Hain ;
Abdelwahed, Namir .
SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2018), 2019, 148 :343-352
[2]  
Alamdy S., 2017, J. Engin Comp. Sci.. Sudan Uni. Press, V8, P5
[3]   Software Industry Practice in Africa: Case Study Sudan [J].
Alamdy, Saleh ;
Osman, Rasha .
2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2017, :743-748
[4]  
[Anonymous], 2014, 18 INT C EVALUATION, DOI DOI 10.1145/2601248.2601284
[5]  
[Anonymous], 2021, Anaconda
[6]  
[Anonymous], 2005, WORKSHOP PREDICTOR M, DOI DOI 10.1145/1083165.1083166
[7]   Predicting software effort from use case points: A systematic review [J].
Azzeh, Mohammad ;
Nassif, Ali Bou ;
Attili, Imtinan Basem .
SCIENCE OF COMPUTER PROGRAMMING, 2021, 204
[8]   A hybrid model for estimating software project effort from Use Case Points [J].
Azzeh, Mohammad ;
Nassif, Ali Bou .
APPLIED SOFT COMPUTING, 2016, 49 :981-989
[9]   A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain [J].
Bakir, Ayse ;
Turhan, Burak ;
Bener, Ayse B. .
SOFTWARE QUALITY JOURNAL, 2010, 18 (01) :57-80
[10]   A flexible method to estimate the software development effort based on the classification of projects and localization of comparisons [J].
Bardsiri, Vahid Khatibi ;
Jawawi, Dayang Norhayati Abang ;
Hashim, Siti Zaiton Mohd ;
Khatibi, Elham .
EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (04) :857-884