共 29 条
Strategies for aggregating gene expression data: The collapseRows R function
被引:234
作者:
Miller, Jeremy A.
[1
,2
]
Cai, Chaochao
[1
,3
]
Langfelder, Peter
[1
,3
]
Geschwind, Daniel H.
[1
,4
]
Kurian, Sunil M.
[5
]
Salomon, Daniel R.
[5
]
Horvath, Steve
[1
,3
]
机构:
[1] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Interdept Program Neurosci, Los Angeles, CA USA
[3] Univ Calif Los Angeles, Dept Biostat, Los Angeles, CA USA
[4] Univ Calif Los Angeles, Dept Neurol, Los Angeles, CA 90024 USA
[5] Scripps Res Inst, Dept Mol & Expt Med, La Jolla, CA 92037 USA
关键词:
DECONVOLUTION;
TRANSCRIPTOME;
ANNOTATION;
PATTERNS;
PACKAGE;
D O I:
10.1186/1471-2105-12-322
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
Background: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. Results: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. Conclusions: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.
引用
收藏
页数:13
相关论文