A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips

被引:58
作者
Liu, XJ
Milo, M
Lawrence, ND
Rattray, M
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
基金
英国惠康基金; 英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/bti583
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Affymetrix GeneChip (R) arrays are currently the most widely used microarray technology. Many summarization methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large datasets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and does not account for specific binding to the mismatch (MM) probes. Results: We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark datasets and a real time-course dataset, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions.
引用
收藏
页码:3637 / 3644
页数:8
相关论文
共 18 条
[1]  
*AFF INC, 2001, MICR STAT US GUID VE
[2]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267, DOI [DOI 10.1007/978-1-4612-1694-0_15, 10.1007/978-1-4612-1694-0_15]
[3]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[4]   A benchmark for affymetrix GeneChip expression measures [J].
Cope, LM ;
Irizarry, RA ;
Jaffee, HA ;
Wu, ZJ ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (03) :323-331
[5]   BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data [J].
Hein, AMK ;
Richardson, S ;
Causton, HC ;
Ambler, GK ;
Green, PJ .
BIOSTATISTICS, 2005, 6 (03) :349-373
[6]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[7]   Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection [J].
Li, C ;
Wong, WH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (01) :31-36
[8]   Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance [J].
Lin, KK ;
Chudova, D ;
Hatfield, GW ;
Smyth, P ;
Andersen, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (45) :15955-15960
[9]   Expression monitoring by hybridization to high-density oligonucleotide arrays [J].
Lockhart, DJ ;
Dong, HL ;
Byrne, MC ;
Follettie, MT ;
Gallo, MV ;
Chee, MS ;
Mittmann, M ;
Wang, CW ;
Kobayashi, M ;
Horton, H ;
Brown, EL .
NATURE BIOTECHNOLOGY, 1996, 14 (13) :1675-1680
[10]  
Milo M, 2003, BIOCHEM SOC T, V31, P1510