A study of mutual information based feature selection for case based reasoning in software cost estimation

被引:72
作者
Li, Y. F. [1 ]
Xie, M. [1 ]
Go, T. N. [1 ]
机构
[1] Natl Univ Singapore, Dept Ind & Syst Engn, Singapore 119260, Singapore
关键词
Case based reasoning; Feature selection; Mutual information; Software cost estimation; INPUT FEATURE-SELECTION; EFFORT PREDICTION; NEURAL-NETWORK; ANALOGY; MODELS; REGRESSION; WEIGHTS; SYSTEMS;
D O I
10.1016/j.eswa.2008.07.062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software cost estimation is one of the most crucial activities in software development process. In the past decades, many methods have been proposed for cost estimation. Case based reasoning (CBR) is one of these techniques. Feature selection is an important preprocessing stage of case based reasoning. Most existing feature selection methods of case based reasoning are 'wrappers' which can usually yield high fitting accuracy at the cost of high computational complexity and low explanation of the selected features. In our study, the mutual information based feature selection (MICBR) is proposed. This approach hybrids both 'wrapper' and 'filter' mechanism which is another kind of feature selector with much lower complexity than wrappers, and the features selected by filters are likely to be generalized to other conditions. The MICBR is then compared with popular feature selectors and the published works. The results show that the MICBR is an effective feature selector for case based reasoning by overcoming some of the limitations and computational complexities of other feature selection techniques in the field. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5921 / 5931
页数:11
相关论文
共 40 条
[1]  
AHN H, EXPERT SYST IN PRESS
[2]   LEARNING BOOLEAN CONCEPTS IN THE PRESENCE OF MANY IRRELEVANT FEATURES [J].
ALMUALLIM, H ;
DIETTERICH, TG .
ARTIFICIAL INTELLIGENCE, 1994, 69 (1-2) :279-305
[3]   A simulation tool for efficient analogy based cost estimation [J].
Angelis L. ;
Stamelos I. .
Empirical Software Engineering, 2000, 5 (1) :35-68
[4]  
[Anonymous], 1989, ANAL STAT PRODUCTIVI
[5]  
[Anonymous], 1964, MATH THEORY COMMUNIC
[6]  
[Anonymous], 2002, Applied Statistics for Software Managers
[7]  
[Anonymous], 1993, Case-Based Reasoning
[8]   Optimal project feature weights in analogy-based cost estimation: Improvement and limitations [J].
Auer, M ;
Trendowicz, A ;
Graser, B ;
Haunschmid, E ;
Biffl, S .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2006, 32 (02) :83-92
[9]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[10]   Can genetic programming improve software effort estimation? A comparative evaluation [J].
Burgess, CJ ;
Lefley, M .
INFORMATION AND SOFTWARE TECHNOLOGY, 2001, 43 (14) :863-873