A tree-based algorithm for attribute selection

被引:8
作者
Baranauskas, Jose Augusto [1 ]
Netto, Oscar Picchi [1 ]
Nozawa, Sergio Ricardo [2 ]
Macedo, Alessandra Alaniz [1 ]
机构
[1] Univ Sao Paulo, Dept Comp Sci & Math, Fac Philosophy Sci & Languages Ribeirao Preto, Ave Bandeirantes 3900, BR-14040901 Ribeirao Preto, SP, Brazil
[2] Dow AgroSci Seeds Traits Oils, Ave Antonio Diederichsen 400, BR-14020250 Ribeirao Preto, SP, Brazil
关键词
Attribute selection; Filter; Decision tree; High dimensional data; Data pre-processing; WRAPPERS;
D O I
10.1007/s10489-017-1008-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an improved version of a decision tree-based filter algorithm for attribute selection. This algorithm can be seen as a pre-processing step of induction algorithms of machine learning and data mining tasks. The filter was evaluated based on thirty medical datasets considering its execution time, data compression ability and AUC (Area Under ROC Curve) performance. On average, our filter was faster than Relief-F but slower than both CFS and Gain Ratio. However for low-density (high-dimensional) datasets, our approach selected less than 2% of all attributes at the same time that it did not produce performance degradation during its further evaluation based on five different machine learning algorithms.
引用
收藏
页码:821 / 833
页数:13
相关论文
共 38 条
[1]  
[Anonymous], 2005, DATA MINING
[2]  
[Anonymous], 2014, C4. 5: programs for machine learning
[3]  
[Anonymous], AM ASS ARTIFICIAL IN
[4]  
Baranauskas JA, 1999, ICMC USP, V87
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[7]  
Devaraj Senthilkumar, 2015, ScientificWorldJournal, V2015, P821798, DOI 10.1155/2015/821798
[8]   Fizzy: feature subset selection for metagenomics [J].
Ditzler, Gregory ;
Morrison, J. Calvin ;
Lan, Yemin ;
Rosen, Gail L. .
BMC BIOINFORMATICS, 2015, 16
[9]   Improvements on cross-validation: The .632+ bootstrap method [J].
Efron, B ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :548-560
[10]   Normalized Mutual Information Feature Selection [J].
Estevez, Pablo. A. ;
Tesmer, Michel ;
Perez, Claudio A. ;
Zurada, Jacek A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201