PLS-Based Gene Selection and Identification of Tumor-Specific Genes

被引:30
作者
Ji, Guoli [1 ]
Yang, Zijiang [2 ]
You, Wenjie [1 ,3 ]
机构
[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China
[2] York Univ, Sch Informat Technol, Toronto, ON M3J 1P3, Canada
[3] Fujian Normal Univ, Dept Math & Comp Sci, Fuzhou 350300, Fujian, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2011年 / 41卷 / 06期
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Gene selection; high-dimensional small samples; partial least squares (PLS); tumor-specific gene; PARTIAL LEAST-SQUARES; CANCER CLASSIFICATION; MUTUAL INFORMATION; PREDICTION;
D O I
10.1109/TSMCC.2010.2078503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In view of the characteristics of high-dimensional small sample, strong relevance, and high noise of the identification of tumor-specific genes on microarray, a novel partial least squares (PLS) based gene-selection method, which synthesizes genetic relatedness and is suitable for multicategory classification, is presented. Using the explanation difference of independent variables on dependent variable (class), we define three indicators for global gene selection, which takes into accounts the combined effects of all the genes and the correlation among the genes. Integrated with the linear kernel support vector classifier (SVC), the proposed method is tested by MIT acute myeloid leukemia/acute lymphoblastic leukemia (AML/ALL) and small round blue cell tumors (SRBCT) data sets. A subset of specific genes with small numbers and high identification are obtained. The results indicate that our proposed PLS-based method for tumor-specific genes selection is highly efficient. Compared to the literature, the selected specific genes from both two-category dataset AML/ALL and multicategory dataset SRBCT are credible. Further investigation shows that the proposed gene-selection method is robust. Overall, the proposed method can effectively solve feature-selection problem on high-dimensional small sample. At the same time, it has good performance for multicategory classification as well.
引用
收藏
页码:830 / 841
页数:12
相关论文
共 34 条
[1]   Evolutionary rough feature selection in gene expression data [J].
Banerjee, Mohua ;
Mitra, Sushmita ;
Banka, Haider .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (04) :622-632
[2]  
Bins J, 2001, EIGHTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOL II, PROCEEDINGS, P159, DOI 10.1109/ICCV.2001.937619
[3]  
Dai JJ, 2006, STAT APPL GENET MOL, V5
[4]   SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[5]   Rank sum method for related gene selection and its application to tumor diagnosis [J].
Deng, L ;
Ma, JW ;
Pei, J .
CHINESE SCIENCE BULLETIN, 2004, 49 (15) :1652-1657
[6]  
[封举富 Feng Jufu], 2005, [北京大学学报. 自然科学版, Acta Scientiarum Naturalium Universitatis Pekinensis], V41, P122
[7]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[8]  
Gordon GJ, 2002, CANCER RES, V62, P4963
[9]   Gait Feature Subset Selection by Mutual Information [J].
Guo, Baofeng ;
Nixon, Mark S. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (01) :36-46
[10]   SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification [J].
Gutkin, Michael ;
Shamir, Ron ;
Dror, Gideon .
PLOS ONE, 2009, 4 (07)