Benchmarking attribute selection techniques for discrete class data mining

被引:743
作者
Hall, MA [1 ]
Holmes, G [1 ]
机构
[1] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
关键词
attribute selection; classification; benchmarking;
D O I
10.1109/TKDE.2003.1245283
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant, and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes.
引用
收藏
页码:1437 / 1447
页数:11
相关论文
共 16 条
  • [1] ALMUALLIM H, 1991, PROCEEDINGS : NINTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, P547
  • [2] [Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
  • [3] [Anonymous], 1993, P 13 INT JOINT C ART
  • [4] Blake C.L., 1998, UCI repository of machine learning databases
  • [5] Selection of relevant features and examples in machine learning
    Blum, AL
    Langley, P
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 245 - 271
  • [6] Dash M., 1997, INTELLIGENT DATA ANA, V1
  • [7] Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
  • [8] Hall M.A., 2000, Working Paper], DOI DOI 10.5555/645529.657793
  • [9] Hall M. A., 1998, THESIS U WAIKATO HAM
  • [10] Huan Liu, 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P319