A Filter Based Feature Selection Algorithm Using Null Space of Covariance Matrix for DNA Microarray Gene Expression Data

被引:20
作者
Sharma, Alok [1 ,2 ,3 ]
Imoto, Seiya [1 ]
Miyano, Satoru [1 ]
机构
[1] Univ Tokyo, Ctr Human Genome, Lab DNA Informat Anal, Tokyo 1138654, Japan
[2] Griffith Univ, Signal Proc Lab, Nathan, Qld 4111, Australia
[3] Univ S Pacific, Sch Engn & Phys, Suva, Fiji
关键词
Cancer classification; covariance matrix; DNA microarray gene expression data; feature or gene selection; Filter based method; null space; SINGULAR-VALUE DECOMPOSITION; SAMPLE-SIZE PROBLEM; CLASSIFICATION; CANCER; PREDICTION; LDA;
D O I
10.2174/157489312802460802
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new filter based feature selection algorithm for classification based on DNA microarray gene expression data. It utilizes null space of covariance matrix for feature selection. The algorithm can perform bulk reduction of features (genes) while maintaining the quality information in the reduced subset of features for discriminative purpose. Thus, it can be used as a pre-processing step for other feature selection algorithms. The algorithm does not assume statistical independency among the features. The algorithm shows promising classification accuracy when compared with other existing techniques on several DNA microarray gene expression datasets.
引用
收藏
页码:289 / 294
页数:6
相关论文
共 23 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[3]  
Ben-Bassat M., 1982, Handbook of statistics, V2, P773, DOI DOI 10.1016/S0169-7161(82)02038-0
[4]   A new LDA-based face recognition system which can solve the small sample size problem [J].
Chen, LF ;
Liao, HYM ;
Ko, MT ;
Lin, JC ;
Yu, GJ .
PATTERN RECOGNITION, 2000, 33 (10) :1713-1726
[5]  
Duda R. O., 1973, Pattern Classification and Scene Analysis, V3
[6]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[7]  
Gordon GJ, 2002, CANCER RES, V62, P4963
[8]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[9]  
Huang R, 2002, INT C PATT RECOG, P29, DOI 10.1109/ICPR.2002.1047787
[10]   An assessment of recently published gene expression data analyses: Reporting experimental design and statistical factors [J].
Jafari P. ;
Azuaje F. .
BMC Medical Informatics and Decision Making, 6 (1)