Benefiting feature selection by the discovery of false irrelevant attributes

被引:1
作者
Chao, Lidia S. [1 ]
Wong, Derek F. [1 ]
Chen, Philip C. L. [1 ]
Ng, Wing W. Y. [2 ]
Yeung, Daniel S. [2 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Macau, Peoples R China
[2] S China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510000, Guangdong, Peoples R China
关键词
Supportive relevance; hidden interaction; data preprocessing; feature selection; data mining; MUTUAL INFORMATION; MICROARRAY DATA; CLASSIFICATION; RELEVANCE;
D O I
10.1142/S021969131550023X
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The ordinary feature selection methods select only the explicit relevant attributes by filtering the irrelevant ones. They trade the selection accuracy for the execution time and complexity. In which, the hidden supportive information possessed by the irrelevant attributes may be lost, so that they may miss some good combinations. We believe that attributes are useless regarding the classification task by themselves, sometimes may provide potentially useful supportive information to other attributes and thus benefit the classification task. Such a strategy can minimize the information lost, therefore is able to maximize the classification accuracy. Especially for the dataset contains hidden interactions among attributes. This paper proposes a feature selection methodology from a new angle that selects not only the relevant features, but also targeting at the potentially useful false irrelevant attributes by measuring their supportive importance to other attributes. The empirical results validate the hypothesis by demonstrating that the proposed approach outperforms most of the state-of-the-art filter based feature selection methods.
引用
收藏
页数:17
相关论文
共 44 条
  • [1] [Anonymous], 2004, P 21 INT C MACHINE L, DOI DOI 10.1145/1015330.1015377
  • [2] Banzhaf J F., 1965, RUTGERs LAW REVIEW, V19, P317
  • [3] USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING
    BATTITI, R
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04): : 537 - 550
  • [4] Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking
    Bermejo, Pablo
    de la Ossa, Luis
    Gamez, Jose A.
    Puerta, Jose M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) : 35 - 44
  • [5] Bonow R O., 2011, Braunwald's heart disease: a textbook of cardiovascular medicine
  • [6] Brown G, 2012, J MACH LEARN RES, V13, P27
  • [7] Caruana R., 2003, Journal of Machine Learning Research, V3, P1245, DOI 10.1162/153244303322753652
  • [8] Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy
    Cheng, Hongrong
    Qin, Zhiguang
    Feng, Chaosheng
    Wang, Yong
    Li, Fagen
    [J]. ETRI JOURNAL, 2011, 33 (02) : 210 - 218
  • [9] Uniqueness of medical data mining
    Cios, KJ
    Moore, GW
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2002, 26 (1-2) : 1 - 24
  • [10] Consistency-based search in feature selection
    Dash, M
    Liu, HA
    [J]. ARTIFICIAL INTELLIGENCE, 2003, 151 (1-2) : 155 - 176