Distributed feature selection: An application to microarray data classification

被引:132
作者
Bolon-Canedo, V. [1 ]
Sanchez-Marono, N. [1 ]
Alonso-Betanzos, A. [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Lab Res & Dev Artificial Intelligence LIDIA, La Coruna 15071, Spain
关键词
Feature selection; Distributed learning; Microarray data; ENSEMBLE;
D O I
10.1016/j.asoc.2015.01.035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is often required as a preliminary step for many pattern recognition problems. However, most of the existing algorithms only work in a centralized fashion, i.e. using the whole dataset at once. In this research a new method for distributing the feature selection process is proposed. It distributes the data by features, i.e. according to a vertical distribution, and then performs a merging procedure which updates the feature subset according to improvements in the classification accuracy. The effectiveness of our proposal is tested on microarray data, which has brought a difficult challenge for researchers due to the high number of gene expression contained and the small samples size. The results on eight microarray datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:136 / 150
页数:15
相关论文
共 45 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
Ananthanarayana V.S., 2000, HIGH PERF COMP HIPC, P9
[3]  
[Anonymous], 2006, FEATURE EXTRACTION F
[4]  
[Anonymous], 2002, P COMP VOL 2 HELL C
[5]  
[Anonymous], IJCAI 2001 WORKSHOP
[6]  
[Anonymous], STOCHASTIC ATTRIBUTE
[7]  
Banerjee M., 2011, Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, P2281, DOI DOI 10.1145/2063576.2063946
[8]  
Bay S. D., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P37
[9]   Evaluation of SMOTE for high-dimensional class-imbalanced microarray data [J].
Blagus, Rok ;
Lusa, Lara .
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, :89-94
[10]   Data classification using an ensemble of filters [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
NEUROCOMPUTING, 2014, 135 :13-20