Two-way analysis of high-dimensional collinear data

被引:0
作者
Ilkka Huopaniemi
Tommi Suvitaival
Janne Nikkilä
Matej Orešič
Samuel Kaski
机构
[1] Helsinki University of Technology (TKK),Department of Information and Computer Science
[2] University of Helsinki,Department of Basic Veterinary Sciences (Division of Microbiology and Epidemiology), Faculty of Veterinary Medicine
[3] VTT Technical Research Centre of Finland (VTT),undefined
来源
Data Mining and Knowledge Discovery | 2009年 / 19卷
关键词
ANOVA; Factor analysis; Hierarchical model; Metabolomics; Multi-way analysis; Small sample-size;
D O I
暂无
中图分类号
学科分类号
摘要
We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.
引用
收藏
页码:261 / 276
页数:15
相关论文
共 67 条
[1]  
Benjamini Y(1995)Controlling the false discovery rate: a practical and powerful approach to multiple testing J R Stat Soc Ser B (Methodological) 57 289-300
[2]  
Hochberg Y(2005)Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments Stat Model 5 243-267
[3]  
Celeux G(2002)50–50 multivariate analysis of variance for collinear responses J R Stat Soc Ser D-the Statistician 51 305-317
[4]  
Martin O(2006)A mixture model with random-effects components for clustering correlated gene-expression profiles Bioinformatics 22 1745-1752
[5]  
Lavergne C(2008)Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes J Exp Med 205 2975-2984
[6]  
Langsrud O(1999)A unifying review of linear Gaussian models Neural Comput 11 305-345
[7]  
Ng SK(2008)MMG: a probabilistic tool to identify submodules of metabolic pathways Bioinformatics 24 1078-1084
[8]  
McLachlan GJ(2007)Of mice and men: sparse statistical modelling in cardiovascular genomics Ann Appl Stat 1 152-178
[9]  
Wang K(2005)ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data Bioinformatics 21 3043-3048
[10]  
Ben-Tovim Jones L(2006)Review: On the analysis and interpretation of correlations in metabolomic data Brief Bioinform 7 151-158