Automatic cross-validation in structured models: Is it time to leave out leave-one-out?

被引:7
作者
Adin, Aritz [1 ,2 ]
Krainski, Elias Teixeira [1 ,3 ]
Lenzi, Amanda [1 ,4 ]
Liu, Zhedong [1 ,5 ]
Martinez-Minaya, Joaquin [1 ,6 ]
Rue, Havard [1 ,3 ]
机构
[1] Univ Publ Navarra, Campus Arrosadia, Pamplona 31006, Spain
[2] Univ Publ Navarra, Inst Adv Mat & Math InaMat2, Dept Stat Comp Sci & Math, Pamplona, Spain
[3] King Abdullah Univ Sci & Technol KAUST, Stat Program, Comp Elect & Math Sci & Engn Div, Thuwal, Saudi Arabia
[4] Univ Edinburgh, Sch Math, Edinburgh, Scotland
[5] RIKEN Ctr AI Project, Tokyo, Japan
[6] Univ Politecn Valencia, Dept Appl Stat Operat Res & Qual, Valencia, Spain
关键词
Cross-validation; Hierarchical models; INLA; Spatial statistics; COMPOSITIONAL DATA-ANALYSIS; EVOLUTION; JOINT;
D O I
10.1016/j.spasta.2024.100843
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Standard techniques such as leave-one-out cross-validation (LOOCV) might not be suitable for evaluating the predictive performance of models incorporating structured random effects. In such cases, the correlation between the training and test sets could have a notable impact on the model's prediction error. To overcome this issue, an automatic group construction procedure for leave-group-out cross validation (LGOCV) has recently emerged as a valuable tool for enhancing predictive performance measurement in structured models. The purpose of this paper is (i) to compare LOOCV and LGOCV within structured models, emphasizing model selection and predictive performance, and (ii) to provide real data applications in spatial statistics using complex structured models fitted with INLA, showcasing the utility of the automatic LGOCV method. First, we briefly review the key aspects of the recently proposed LGOCV method for automatic group construction in latent Gaussian models. We also demonstrate the effectiveness of this method for selecting the model with the highest predictive performance by simulating extrapolation tasks in both temporal and spatial data analyses. Finally, we provide insights into the effectiveness of the LGOCV method in modeling complex structured data, encompassing spatio-temporal multivariate count data, spatial compositional data, and spatio-temporal geospatial data.
引用
收藏
页数:17
相关论文
共 48 条
  • [1] Adin A., 2023, Statistical Methods At the Forefront of Biomedical Advances, P1, DOI [10.1007/978-3-031-32729-21, DOI 10.1007/978-3-031-32729-21]
  • [2] Alleviating confounding in spatio-temporal areal models with an application on crimes against women in India
    Adin, Aritz
    Goicoa, Tomas
    Hodges, James S.
    Schnell, Patrick M.
    Ugarte, Maria D.
    [J]. STATISTICAL MODELLING, 2023, 23 (01) : 9 - 30
  • [3] Heavy metal pollution of street dust in the largest city of Mexico, sources and health risk assessment
    Aguilera, Anahi
    Bautista, Francisco
    Gutierrez-Ruiz, Margarita
    Ceniceros-Gomez, Agueda E.
    Cejudo, Ruben
    Goguitchaichvili, Avto
    [J]. ENVIRONMENTAL MONITORING AND ASSESSMENT, 2021, 193 (04)
  • [4] A survey of cross-validation procedures for model selection
    Arlot, Sylvain
    Celisse, Alain
    [J]. STATISTICS SURVEYS, 2010, 4 : 40 - 79
  • [5] On the use of cross-validation for time series predictor evaluation
    Bergmeir, Christoph
    Benitez, Jose M.
    [J]. INFORMATION SCIENCES, 2012, 191 : 192 - 213
  • [6] Compositional data analysis in geochemistry: Are we sure to see what really occurs during natural processes?
    Buccianti, A.
    Grunsky, E.
    [J]. JOURNAL OF GEOCHEMICAL EXPLORATION, 2014, 141 : 1 - 5
  • [7] Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models
    Burkner, Paul-Christian
    Gabry, Jonah
    Vehtari, Aki
    [J]. COMPUTATIONAL STATISTICS, 2021, 36 (02) : 1243 - 1261
  • [8] Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression
    Douma, Jacob C.
    Weedon, James T.
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2019, 10 (09): : 1412 - 1430
  • [9] Compositional data analysis for physical activity, sedentary time and sleep research
    Dumuid, Dorothea
    Stanford, Tyman E.
    Martin-Fernandez, Josep-Antoni
    Pedisic, Zeljko
    Maher, Carol A.
    Lewis, Lucy K.
    Hron, Karel
    Katzmarzyk, Peter T.
    Chaput, Jean-Philippe
    Fogelholm, Mikael
    Hu, Gang
    Lambert, Estelle V.
    Maia, Jose
    Sarmiento, Olga L.
    Standage, Martyn
    Barreira, Tiago V.
    Broyles, Stephanie T.
    Tudor-Locke, Catrine
    Tremblay, Mark S.
    Olds, Timothy
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (12) : 3726 - 3738
  • [10] Geochemical evolution of produced waters from hydraulic fracturing of the Marcellus Shale, northern Appalachian Basin: A multivariate compositional data analysis approach
    Engle, Mark A.
    Rowan, Elisabeth L.
    [J]. INTERNATIONAL JOURNAL OF COAL GEOLOGY, 2014, 126 : 45 - 56