Stability of Filter- and Wrapper-based Software Metric Selection Techniques

被引:0
作者
Wang, Huanjing [1 ]
Khoshgoftaar, Taghi M. [2 ]
Napolitano, Amri [2 ]
机构
[1] Western Kentucky Univ, Bowling Green, KY 42101 USA
[2] Florida Atlantic Univ, Boca Raton, FL 33431 USA
来源
2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI) | 2014年
关键词
feature subset selection; software measurements; filters; wrappers; stability; ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For most software systems, some of the software metrics collected during the software development cycle may contain redundant information, provide no information, or may have an adverse effect on prediction models built with these metrics. An intelligent selection of software metrics (features) using feature selection techniques (which reduce the feature subset to an optimal size) prior to building defect prediction models may improve the final defect prediction results. While some feature selection techniques consider each feature individually, feature subset selection evaluates entire feature subsets and thus can help remove redundant features. Unfortunately, feature subset selection may have the problem of selecting different features from similar datasets. This paper addresses the question of which feature subset selection methods are stable in the face of changes to the data (here, the addition or removal of instances). We examine twenty-seven feature subset selection methods, including two filter-based techniques and twenty-five wrapper-based techniques (five choices of wrapper learner combined with five choices of wrapper performance metric). We used the Average Tanimoto Index (ATI) as our stability metric, because it is able to compare two feature subsets of different size. All experiments were conducted on three software metric datasets from a real-world software project. Our results show that the Correlation-Based Feature Selection (CFS) approach has the greatest stability overall. All wrapper-based techniques are less stable than CFS. Among the twenty-five wrappers, in general the Naive Bayes learner using either the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) or the Area Under the Precision-Recall Curve (PRC) performance metrics are the most stable wrapper-based approaches.
引用
收藏
页码:309 / 314
页数:6
相关论文
共 27 条
  • [11] The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
    Haury, Anne-Claire
    Gestraud, Pierre
    Vert, Jean-Philippe
    [J]. PLOS ONE, 2011, 6 (12):
  • [12] Stable feature selection for biomarker discovery
    He, Zengyou
    Yu, Weichuan
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) : 215 - 225
  • [13] Huanjing Wang, 2011, Proceedings of the 2011 Tenth International Conference on Machine Learning and Applications (ICMLA 2011), P151, DOI 10.1109/ICMLA.2011.133
  • [14] John G. H., 1995, Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference (1995), P338
  • [15] Stability of feature selection algorithms: a study on high-dimensional spaces
    Kalousis, Alexandros
    Prados, Julien
    Hilario, Melanie
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 95 - 116
  • [16] Kehan Gao, 2012, International Journal of Information and Decision Sciences, V4, P217
  • [17] Khoshgoftaar Taghi M., 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI), P424, DOI 10.1109/IRI.2013.6642502
  • [18] Wrappers for feature subset selection
    Kohavi, R
    John, GH
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 273 - 324
  • [19] Kuncheva LI, 2007, PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, P390
  • [20] LECESSIE S, 1992, APPL STAT-J ROY ST C, V41, P191