PREDICTING PALEOCLIMATE FROM COMPOSITIONAL DATA USING MULTIVARIATE GAUSSIAN PROCESS INVERSE PREDICTION

被引:2
作者
Tipton, John R. [1 ]
Hooten, Mevin B. [2 ,3 ]
Nolan, Connor [4 ]
Booth, Robert K. [5 ]
McLachlan, Jason [6 ]
机构
[1] Univ Arkansas, Dept Math Sci, Fayetteville, AR 72701 USA
[2] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
[3] US Geol Survey, Colorado Cooperat Fish & Wildlife Res Unit, Dept Fish Wildlife & Conservat Biol, Ft Collins, CO 80523 USA
[4] Univ Arizona, Dept Geosci, Tucson, AZ 85721 USA
[5] Lehigh Univ, Earth & Environm Sci Dept, Bethlehem, PA 18015 USA
[6] Univ Notre Dame, Dept Biol, Notre Dame, IN 46556 USA
基金
美国国家科学基金会;
关键词
Bayesian hierarchical models; predictive validation; model comparison; ecological functional response model; BAYESIAN-INFERENCE; FOREST COMPOSITION; MODEL; PH; DIATOMS; RECONSTRUCTION; CALIBRATION; LIKELIHOOD; REGRESSION; ANALOGS;
D O I
10.1214/19-AOAS1281
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Multivariate compositional count data arise in many applications including ecology, microbiology, genetics and paleoclimate. A frequent question in the analysis of multivariate compositional count data is what underlying values of a covariate(s) give rise to the observed composition. Learning the relationship between covariates and the compositional count allows for inverse prediction of unobserved covariates given compositional count observations. Gaussian processes provide a flexible framework for modeling functional responses with respect to a covariate without assuming a functional form. Many scientific disciplines use Gaussian process approximations to improve prediction and make inference on latent processes and parameters. When prediction is desired on unobserved covariates given realizations of the response variable, this is called inverse prediction. Because inverse prediction is often mathematically and computationally challenging, predicting unobserved covariates often requires fitting models that are different from the hypothesized generative model. We present a novel computational framework that allows for efficient inverse prediction using a Gaussian process approximation to generative models. Our framework enables scientific learning about how the latent processes co-vary with respect to covariates while simultaneously providing predictions of missing covariates. The proposed framework is capable of efficiently exploring the high dimensional, multi-modal latent spaces that arise in the inverse problem. To demonstrate flexibility, we apply our method in a generalized linear model framework to predict latent climate states given multivariate count data. Based on cross-validation, our model has predictive skill competitive with current methods while simultaneously providing formal, statistical inference on the underlying community dynamics of the biological system previously not available.
引用
收藏
页码:2363 / 2388
页数:26
相关论文
共 63 条
  • [11] DIATOMS AND PH RECONSTRUCTION
    BIRKS, HJB
    LINE, JM
    JUGGINS, S
    STEVENSON, AC
    TERBRAAK, CJF
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1990, 327 (1240) : 263 - 278
  • [12] Booth RK., 2010, Mires and Peat, V7, P1
  • [13] Testate amoebae as proxies for mean annual waiter-table depth in Sphagnum-dominated peatlands of North America
    Booth, Robert K.
    [J]. JOURNAL OF QUATERNARY SCIENCE, 2008, 23 (01) : 43 - 57
  • [14] Paleoecoinformatics: applying geohistorical data to ecological questions
    Brewer, Simon
    Jackson, Stephen T.
    Williams, John W.
    [J]. TRENDS IN ECOLOGY & EVOLUTION, 2012, 27 (02) : 104 - 112
  • [15] Summer water deficit variability controls on peatland water-table changes: implications for Holocene palaeoclimate reconstructions
    Charman, Dan J.
    [J]. HOLOCENE, 2007, 17 (02) : 217 - 227
  • [16] Fixed rank kriging for very large spatial data sets
    Cressie, Noel
    Johannesson, Gardar
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 209 - 226
  • [17] Csato L., 2002, THESIS
  • [18] Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets
    Datta, Abhirup
    Banerjee, Sudipto
    Finley, Andrew O.
    Gelfand, Alan E.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 800 - 812
  • [19] Quantifying pollen-vegetation relationships to reconstruct ancient forests using 19th-century forest composition and pollen data
    Dawson, Andria
    Paciorek, Christopher J.
    McLachlan, Jason S.
    Goring, Simon
    Williams, John W.
    Jackson, Stephen T.
    [J]. QUATERNARY SCIENCE REVIEWS, 2016, 137 : 156 - 175
  • [20] RcppArmadillo: Accelerating R with high-performance C plus plus linear algebra
    Eddelbuettel, Dirk
    Sanderson, Conrad
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 1054 - 1063