SEMI-SUPERVISED NONPARAMETRIC BAYESIAN MODELLING OF SPATIAL PROTEOMICS

被引:1
作者
Crook, By Oliver m. [1 ,2 ]
Lilley, Kathryn s. [2 ]
Gatto, Laurent [3 ]
Kirk, Paul D. W. [1 ]
机构
[1] Univ Cambridge, Sch Clin Med, MRC Biostat Unit, Cambridge, England
[2] Univ Cambridge, Cambridge Ctr Prote, Dept Biochem, Cambridge, England
[3] UCLouvain, de Dave Inst, Laurent, Belgium
基金
英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
Proteomics; Bayesian mixture models; semi-supervised learning; PROTEIN SUBCELLULAR-LOCALIZATION; DIFFERENTIAL GENE-EXPRESSION; PLANT GOLGI-APPARATUS; GAUSSIAN-PROCESSES; ARABIDOPSIS-THALIANA; ORGANELLE PROTEOME; TIME; CLASSIFICATION; MISLOCALIZATION; IDENTIFICATION;
D O I
10.1214/22-AOAS1603
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Understanding subcellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high-resolution mapping of thousands of proteins to subcellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a nonparametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a subcellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e., proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonianwithin-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.
引用
收藏
页码:2554 / 2576
页数:23
相关论文
共 108 条
  • [1] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [2] Barylyuk K., 2020, bioRxiv
  • [3] A Portrait of the Human Organelle Proteome In Space and Time during Cytomegalovirus Infection
    Beltran, Pierre M. Jean
    Mathias, Rommel A.
    Cristea, Ileana M.
    [J]. CELL SYSTEMS, 2016, 3 (04) : 361 - +
  • [4] Optimal tuning of the hybrid Monte Carlo algorithm
    Beskos, Alexandros
    Pillai, Natesh
    Roberts, Gareth
    Sanz-Serna, Jesus-Maria
    Stuart, Andrew
    [J]. BERNOULLI, 2013, 19 (5A) : 1501 - 1534
  • [5] Christian de Duve (1917-2013)
    Blobel, Guenter
    [J]. NATURE, 2013, 498 (7454) : 300 - 300
  • [6] THE DISCRIMINATIVE FUNCTIONAL MIXTURE MODEL FOR A COMPARATIVE ANALYSIS OF BIKE SHARING SYSTEMS
    Bouveyron, Charles
    Come, Etienne
    Jacques, Julien
    [J]. ANNALS OF APPLIED STATISTICS, 2015, 9 (04) : 1726 - 1760
  • [7] The effect of organelle discovery upon sub-cellular protein localisation
    Breckels, L. M.
    Gatto, L.
    Christoforou, A.
    Groen, A. J.
    Lilley, K. S.
    Trotter, M. W. B.
    [J]. JOURNAL OF PROTEOMICS, 2013, 88 : 129 - 140
  • [8] Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics
    Breckels, Lisa M.
    Holden, Sean B.
    Wojnar, David
    Mulvey, Claire M.
    Christoforou, Andy
    Groen, Arnoud
    Trotter, Matthew W. B.
    Kohlbacher, Oliver
    Lilley, Kathryn S.
    Gatto, Laurent
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (05)
  • [9] Rao-Blackwellisation of sampling schemes
    Casella, G
    Robert, CP
    [J]. BIOMETRIKA, 1996, 83 (01) : 81 - 94
  • [10] A draft map of the mouse pluripotent stem cell spatial proteome
    Christoforou, Andy
    Mulvey, Claire M.
    Breckels, Lisa M.
    Geladaki, Aikaterini
    Hurrell, Tracey
    Hayward, Penelope C.
    Naake, Thomas
    Gatto, Laurent
    Viner, Rosa
    Arias, Alfonso Martinez
    Lilley, Kathryn S.
    [J]. NATURE COMMUNICATIONS, 2016, 7