Selecting significant genes by randomization test for cancer classification using gene expression data

被引:26
作者
Mao, Zhiyi [1 ,2 ]
Cai, Wensheng [1 ,2 ]
Shao, Xueguang [1 ,2 ]
机构
[1] Nankai Univ, State Key Lab Med Chem Biol, Coll Chem, Tianjin 300071, Peoples R China
[2] Nankai Univ, Res Ctr Analyt Sci, Coll Chem, Tianjin 300071, Peoples R China
基金
中国国家自然科学基金;
关键词
Gene expression data; Randomization test; Partial least squares discriminant analysis; Gene selection; Cancer classification; PARTIAL LEAST-SQUARES; MICROARRAY DATA; MULTIVARIATE CALIBRATION; MOLECULAR CLASSIFICATION; DISCRIMINANT-ANALYSIS; TUMOR CLASSIFICATION; VARIABLE SELECTION; LUNG-CANCER; PREDICTION; COMPONENT;
D O I
10.1016/j.jbi.2013.03.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gene selection is an important task in bioinformatics studies, because the accuracy of cancer classification generally depends upon the genes that have biological relevance to the classifying problems. In this work, randomization test (RT) is used as a gene selection method for dealing with gene expression data. In the method, a statistic derived from the statistics of the regression coefficients in a series of partial least squares discriminant analysis (PLSDA) models is used to evaluate the significance of the genes. Informative genes are selected for classifying the four gene expression datasets of prostate cancer, lung cancer, leukemia and non-small cell lung cancer (NSCLC) and the rationality of the results is validated by multiple linear regression (MLR) modeling and principal component analysis (PCA). With the selected genes, satisfactory results can be obtained. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:594 / 601
页数:8
相关论文
共 56 条
[1]   Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS [J].
Blanco, R ;
Inza, M ;
Merino, M ;
Quiroga, J ;
Larrañaga, P .
JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (05) :376-388
[2]  
Bo TH, 2002, GENOME BIOL, V3
[3]   Computational selection of distinct class- and subclass-specific gene expression signatures [J].
Bushel, PR ;
Hamadeh, HK ;
Bennett, L ;
Green, J ;
Ableson, A ;
Misener, S ;
Afshari, CA ;
Paules, RS .
JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (03) :160-170
[4]   A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra [J].
Cai, Wensheng ;
Li, Yankun ;
Shao, Xueguang .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 90 (02) :188-194
[5]   An efficient statistical feature selection approach for classification of gene expression data [J].
Chandra, B. ;
Gupta, Manish .
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (04) :529-535
[6]   Biomarker discovery in microarray gene expression data with Gaussian processes [J].
Chu, W ;
Ghahramani, Z ;
Falciani, F ;
Wild, DL .
BIOINFORMATICS, 2005, 21 (16) :3385-3393
[7]  
CRAWFORD AW, 1991, J BIOL CHEM, V266, P5847
[8]   Optimization Based Tumor Classification from Microarray Gene Expression Data [J].
Dagliyan, Onur ;
Uney-Yuksektepe, Fadime ;
Kavakli, I. Halil ;
Turkay, Metin .
PLOS ONE, 2011, 6 (02)
[9]   Delineation of prognostic biomarkers in prostate cancer [J].
Dhanasekaran, SM ;
Barrette, TR ;
Ghosh, D ;
Shah, R ;
Varambally, S ;
Kurachi, K ;
Pienta, KJ ;
Rubin, MA ;
Chinnaiyan, AM .
NATURE, 2001, 412 (6849) :822-826
[10]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)