A feature selection approach for identification of signature genes from SAGE data

被引:4
作者
Barrera, Junior
Cesar, Roberto M.
Humes, Carlos
Martins, David C.
Patrao, Diogo F. C.
Silva, Paulo J. S.
Brentani, Helena
机构
[1] Hosp Canc AC Camargo, Sao Paulo, Brazil
[2] Univ Sao Paulo, Inst Matemat & Estatist, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
GLIOBLASTOMA-MULTIFORME; MICROARRAY ANALYSIS; EXPRESSION; CLASSIFICATION; GLIOMAS; SUBTYPES; TUMORS; PROFILES; CANCER; SETS;
D O I
10.1186/1471-2105-8-169
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results: A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion: The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
引用
收藏
页数:9
相关论文
共 28 条
[1]  
[Anonymous], Supplemental Material
[2]   Identifying genes that contribute most to good classification in microarrays [J].
Baker, Stuart G. ;
Kramer, Barnett S. .
BMC BIOINFORMATICS, 2006, 7 (1)
[3]   An anatomy of normal and malignant gene expression [J].
Boon, K ;
Osório, EC ;
Greenhut, SF ;
Schaefer, CF ;
Shoemaker, J ;
Polyak, K ;
Morin, PJ ;
Buetow, KH ;
Strausberg, RL ;
de Souza, SJ ;
Riggins, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (17) :11287-11292
[4]   Bolstered error estimation [J].
Braga-Neto, U ;
Dougherty, E .
PATTERN RECOGNITION, 2004, 37 (06) :1267-1281
[5]   Identification of genes differentially expressed in glioblastoma versus pilocytic astrocytoma using Suppression Subtractive Hybridization [J].
Colin, C ;
Baeza, N ;
Bartoli, C ;
Fina, F ;
Eudes, N ;
Nanni, I ;
Martin, PM ;
Ouafik, L ;
Figarella-Branger, D .
ONCOGENE, 2006, 25 (19) :2818-2826
[6]   Small sample issues for microarray-based classification [J].
Dougherty, ER .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2001, 2 (01) :28-34
[7]   Outcome signature genes in breast cancer: is there a unique set? [J].
Ein-Dor, L ;
Kela, I ;
Getz, G ;
Givol, D ;
Domany, E .
BIOINFORMATICS, 2005, 21 (02) :171-178
[8]   TMEFF1 and brain tumors [J].
Gery, S ;
Yin, D ;
Xie, D ;
Black, KL ;
Koeffler, HP .
ONCOGENE, 2003, 22 (18) :2723-2727
[9]  
Godard S, 2003, CANCER RES, V63, P6613
[10]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537