Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+

被引:38
作者
Li, Jingzhou [1 ]
Ruhe, Guenther [1 ]
机构
[1] Univ Calgary, Software Engn Decis Support Lab, Calgary, AB T2N 1N4, Canada
关键词
effort estimation by analogy; attribute weighting; feature selection; rough set analysis; learning; heuristics;
D O I
10.1007/s10664-007-9054-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Estimation by analogy (EBA) predicts effort for a new project by aggregating effort information of similar projects from a given historical data set. Existing research results have shown that a careful selection and weighting of attributes may improve the performance of the estimation methods. This paper continues along that research line and considers weighting of attributes in order to improve the estimation accuracy. More specifically, the impact of weighting (and selection) of attributes is studied as extensions to our former EBA method AQUA, which has shown promising results and also allows estimation in the case of data sets that have non-quantitative attributes and missing values. The new resulting method is called AQUA(+). For attribute weighting, a qualitative analysis pre-step using rough set analysis (RSA) is performed. RSA is a proven machine learning technique for classification of objects. We exploit the RSA results in different ways and define four heuristics for attribute weighting. AQUA(+) was evaluated in two ways: (1) comparison between AQUA(+) and AQUA, along with the comparative analysis between the proposed four heuristics for AQUA(+), (2) comparison of AQUA(+) with other EBA methods. The main evaluation results are: (1) better estimation accuracy was obtained by AQUA(+) compared to AQUA over all six data sets; and (2) AQUA(+) obtained better results than, or very close to that of other EBA methods for the three data sets applied to all the EBA methods. In conclusion, the proposed attribute weighing method using RSA can improve the estimation accuracy of EBA method AQUA(+) according to the empirical studies over six data sets. Testing more data sets is necessary to get results that are more statistical significant.
引用
收藏
页码:63 / 96
页数:34
相关论文
共 41 条
[1]  
Boehm Barry W., 1981, Software Engineering Economics, V1st
[2]  
BRIAND LC, 2001, ENCY SOFTWARE ENG
[3]   Dealing with missing software project data [J].
Cartwright, MH ;
Shepperd, MJ ;
Song, Q .
NINTH INTERNATIONAL SOFTWARE METRICS SYMPOSIUM, PROCEEDINGS, 2003, :154-165
[4]  
Chen ZH, 2005, IEEE SOFTWARE, V22, P38, DOI 10.1109/MS.2005.151
[5]  
CHMIELEWSKI MR, 1994, 3 INT WORKSH ROUGH S, P294
[6]  
Conte S.D., 1986, SOFTWARE ENG METRICS
[7]  
DESHARNAIS JM, 1989, THESIS U MONTREAL
[8]  
Dougherty J., 1995, MACHINE LEARNING P 1, P194, DOI DOI 10.1016/B978-1-55860-377-6.50032-3
[9]   A LEISURELY LOOK AT THE BOOTSTRAP, THE JACKKNIFE, AND CROSS-VALIDATION [J].
EFRON, B ;
GONG, G .
AMERICAN STATISTICIAN, 1983, 37 (01) :36-48
[10]   A simulation study of the model evaluation criterion MMRE [J].
Foss, T ;
Stensrud, E ;
Kitchenham, B ;
Myrtveit, I .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2003, 29 (11) :985-995