Informative gene selection and the direct classification of tumors based on relative simplicity

被引:25
作者
Chen, Yuan [1 ,2 ]
Wang, Lifeng [3 ]
Li, Lanzhi [2 ]
Zhang, Hongyan [2 ]
Yuan, Zheming [1 ,2 ]
机构
[1] Hunan Prov Key Lab Biol & Control Plant Dis & Ins, Changsha, Hunan, Peoples R China
[2] Hunan Agr Univ, Hunan Prov Key Lab Germplasm Innovat & Utilizat C, Changsha, Hunan, Peoples R China
[3] Hunan Acad Agr Sci, Biotechnol Res Ctr, Changsha, Hunan, Peoples R China
关键词
Microarray expression data; Gene selection; Direct classify; Relative simplicity; Binary-discriminative informative genes; Paired votes; ACUTE LYMPHOBLASTIC-LEUKEMIA; ACUTE MYELOID-LEUKEMIA; HUMAN PROSTATE-CANCER; MOLECULAR CLASSIFICATION; EXPRESSION DATA; BLADDER-CANCER; CELL-CYCLE; BIOMARKER; PREDICTION; ALGORITHM;
D O I
10.1186/s12859-016-0893-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes. Results: Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40 %), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naive Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM. Conclusions: Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy.
引用
收藏
页数:16
相关论文
共 78 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[3]   Identification of a potential role for POU2AFI and BTG4 in the deletion of 11q23 in chronic lymphocytic leukemia [J].
Auer, RL ;
Starczynski, J ;
McElwaine, S ;
Bertoni, F ;
Newland, AC ;
Fegan, CD ;
Cotter, FE .
GENES CHROMOSOMES & CANCER, 2005, 43 (01) :1-10
[4]   Galectin-1 Triggers Epithelial-Mesenchymal Transition in Human Hepatocellular Carcinoma Cells [J].
Bacigalupo, Maria L. ;
Manzi, Malena ;
Espelt, Maria V. ;
Gentilini, Lucas D. ;
Compagno, Daniel ;
Laderach, Diego J. ;
Wolfenstein-Todel, Carlota ;
Rabinovich, Gabriel A. ;
Troncoso, Maria F. .
JOURNAL OF CELLULAR PHYSIOLOGY, 2015, 230 (06) :1298-1309
[5]   Alternatively spliced protein arginine methyltransferase 1 isoform PRMT1v2 promotes the survival and invasiveness of breast cancer cells [J].
Baldwin, R. Mitchell ;
Morettin, Alan ;
Paris, Genevieve ;
Goulet, Isabelle ;
Cote, Jocelyn .
CELL CYCLE, 2012, 11 (24) :4597-4612
[6]   Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[7]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]   MMP13 is potentially a new tumor marker for breast cancer diagnosis [J].
Chang, Hui-Jen ;
Yang, Ming-Je ;
Yang, Yu-Hsiang ;
Hou, Ming-Feng ;
Hsueh, Er-Jung ;
Lin, Shiu-Ru .
ONCOLOGY REPORTS, 2009, 22 (05) :1119-1127
[10]   Improving Cancer Classification Accuracy Using Gene Pairs [J].
Chopra, Pankaj ;
Lee, Jinseung ;
Kang, Jaewoo ;
Lee, Sunwon .
PLOS ONE, 2010, 5 (12)