A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes

被引:32
作者
Rodenburg, Wendy [1 ,2 ,3 ]
Heidema, A. Geert [4 ,5 ,6 ]
Boer, Jolanda M. A. [4 ]
Bovee-Oudenhoven, Ingeborg M. J. [2 ,3 ]
Feskens, Edith J. M. [6 ]
Mariman, Edwin C. M. [5 ]
Keijer, Jaap [1 ,2 ]
机构
[1] Inst Food Safety, RIKILT, NL-6700 AE Wageningen, Netherlands
[2] TI Food & Nutr, Wageningen, Netherlands
[3] NIZO Food Res, Ede, Netherlands
[4] Natl Inst Publ Hlth & Environm RIVM, Bilthoven, Netherlands
[5] Maastricht Univ, Dept Human Biol, Maastricht, Netherlands
[6] Univ Wageningen & Res Ctr, Div Human Nutr, Wageningen, Netherlands
关键词
gene selection; t-test; random forest; biological processes; transcriptomics;
D O I
10.1152/physiolgenomics.00167.2007
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
In whole genome microarray studies major gene expression changes are easily identified, but it is a challenge to capture small, but biologically important, changes. Pathway-based programs can capture small effects but may have the disadvantage of being restricted to functionally annotated genes. A structured approach toward the identification of major and small changes for interpretation of biological effects is needed. We present a structured approach, a framework, that addresses different considerations in 1) the identification of informative genes in microarray data sets and 2) the interpretation of their biological relevance. The steps of this framework include gene ranking, gene selection, gene grouping, and biological interpretation. Random forests (RF), which takes gene-gene interactions into account, is examined to rank and select genes. For human, mouse, and rat whole genome arrays, less than half of the probes on the array are annotated. Consequently, pathway analysis tools ignore half of the information present in the microarray data set. The framework described takes all genes into account. RF is a useful tool to rank genes by taking interactions into account. Applying a permutation approach, we were able to define an objective threshold for gene selection. RF combined with self-organizing maps identified genes with coordinated but small gene expression responses that were not fully annotated but corresponded to the same biological process. The presented approach provides a flexible framework for biological interpretation of microarray data sets. It includes all genes in the data set, takes gene-gene interactions into account, and provides an objective threshold for gene selection.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 48 条
[1]   Nutrigenomics:: From molecular nutrition to prevention of disease [J].
Afman, L ;
Müller, M .
JOURNAL OF THE AMERICAN DIETETIC ASSOCIATION, 2006, 106 (04) :569-576
[2]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[3]  
[Anonymous], 1979, Theoretical statistics
[4]   Significance analysis of functional categories in gene expression studies: a structured permutation approach [J].
Barry, WT ;
Nobel, AB ;
Wright, FA .
BIOINFORMATICS, 2005, 21 (09) :1943-1949
[5]   Gene networks: how to put the function in genomics [J].
Brazhnik, P ;
de la Fuente, A ;
Mendes, P .
TRENDS IN BIOTECHNOLOGY, 2002, 20 (11) :467-472
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Breiman L:., Fortran Code for Random Forests
[8]   Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments [J].
Breitling, R ;
Armengaud, P ;
Amtmann, A ;
Herzyk, P .
FEBS LETTERS, 2004, 573 (1-3) :83-92
[9]   Selection of differentially expressed genes in microarray data analysis [J].
Chen, J. J. ;
Wang, S-J ;
Tsai, C-A ;
Lin, C-J .
PHARMACOGENOMICS JOURNAL, 2007, 7 (03) :212-220
[10]   Microarray data mining with visual programming [J].
Curk, T ;
Demsar, J ;
Xu, QK ;
Leban, G ;
Petrovic, U ;
Bratko, I ;
Shaulsky, G ;
Zupan, B .
BIOINFORMATICS, 2005, 21 (03) :396-398