Spatiotemporal modelling of airborne birch and grass pollen concentration across Switzerland: A comparison of statistical, machine learning and ensemble methods

被引:1
作者
Shokouhi, Behzad Valipour [1 ,2 ]
de Hoogh, Kees [1 ,2 ]
Gehrig, Regula [3 ]
Eeftens, Marloes [1 ,2 ]
机构
[1] Swiss Trop & Publ Hlth Inst, Allschwil, Switzerland
[2] Univ Basel, Basel, Switzerland
[3] Fed Off Meteorol & Climatol MeteoSwiss, Zurich, Switzerland
基金
欧洲研究理事会; 瑞士国家科学基金会;
关键词
Environmental stressors; Pollen; Machine learning; Land use regression; Spatiotemporal models; Exposure assessment; LAND-USE REGRESSION; POLLUTION; CORYLUS; ALNUS;
D O I
10.1016/j.envres.2024.119999
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Background: Statistical and machine learning models are commonly used to estimate spatial and temporal variability in exposure to environmental stressors, supporting epidemiological studies. We aimed to compare the performances, strengths and limitations of six different algorithms in the retrospective spatiotemporal modeling of daily birch and grass pollen concentrations at a spatial resolution of 1 km across Switzerland. Methods: Daily birch and grass pollen concentrations were available from 14 measurement sites in Switzerland for 2000-2019. To develop the spatiotemporal models, we considered spatiotemporal, spatial and temporal predictors including meteorological factors, land-use, elevation, species distribution and Normalized Difference Vegetation Index (NDVI). We used six statistical and machine learning algorithms: LASSO, Ridge, Elastic net, Random forest, XGBoost and ANNs. We optimized model structures through feature selection and grid search techniques to obtain the best predictive performance. We used train-test split and cross-validation to avoid overfitting and overoptimistic performance indicators. We then combined these six models through multiple linear regression to develop an ensemble hybrid model. Results: The 5(th)-95(th) percentiles of birch and grass pollen concentrations were 0-151 and 0-105 grains/m(3), respectively. The hybrid ensemble model achieved the best RMSE on the test dataset for both birch and grass pollen with 94.4 and 19.7 grains/m(3), respectively. Nonlinear models (Random forest, XGBoost and ANNs) achieved lower test RMSE's than linear models (LASSO, Ridge, Elastic net) for both pollen types, with RMSE's ranging from 105.9 to 140.5 grains/m(3) for birch and from 20.0 to 25.4 grains/m(3) for grass pollen. The Random forest algorithm yielded the best spatial and temporal performance among the six evaluated modelling methods. The ensemble hybrid model outperformed the six linear and nonlinear algorithms. Country-wide pollen concentration, land use, weather, and NDVI were important predictors. Conclusion: Nonlinear algorithms outperformed linear models and accurately explained complex, nonlinear relationships between environmental factors and measured concentrations.
引用
收藏
页数:11
相关论文
共 58 条
[1]   Effect of meteorological parameters on Poaceae pollen in the atmosphere of Tetouan (NW Morocco) [J].
Aboulaich, Nadia ;
Achmakh, Lamiaa ;
Bouziane, Hassan ;
Trigo, M. Mar ;
Recio, Marta ;
Kadiri, Mohamed ;
Cabezudo, Baltasar ;
Riadi, Hassane ;
Kazzaz, Mohamed .
INTERNATIONAL JOURNAL OF BIOMETEOROLOGY, 2013, 57 (02) :197-205
[2]   Consistently accurate forecasts of temperature within buildings from sensor data using ridge and lasso regression [J].
Al-Obeidat, Feras ;
Spencer, Bruce ;
Alfandi, Omar .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 :382-392
[3]  
[Anonymous], 2000, AEROBIOLOGIA, DOI DOI 10.1023/A:1007607121614
[4]   Ensemble averaging using remote sensing data to model spatiotemporal PM10 concentrations in sparsely monitored South Africa [J].
Arowosegbe, Oluwaseyi Olalekan ;
Roeoesli, Martin ;
Kuenzli, Nino ;
Saucy, Apolline ;
Adebayo-Ojo, Temitope C. ;
Schwartz, Joel ;
Kebalepile, Moses ;
Jeebhay, Mohamed Fareed ;
Dalvie, Mohamed Aqiel ;
de Hoogh, Kees .
ENVIRONMENTAL POLLUTION, 2022, 310
[5]   Effect of the number of measurement sites on land use regression models in estimating local air pollution [J].
Basagana, Xavier ;
Rivera, Marcela ;
Aguilera, Inmaculada ;
Agis, David ;
Bouso, Laura ;
Elosua, Roberto ;
Foraster, Maria ;
de Nazelle, Audrey ;
Nieuwenhuijsen, Mark ;
Vila, Joan ;
Kuenzli, Nino .
ATMOSPHERIC ENVIRONMENT, 2012, 54 :634-642
[6]   SEDE-GPS: socio-economic data enrichment based on GPS information [J].
Sperlea, Theodor ;
Fueser, Stefan ;
Boenigk, Jens ;
Heider, Dominik .
BMC BIOINFORMATICS, 2018, 19
[7]  
BRINGFELT B, 1982, Grana, V21, P59
[8]   Relation between airborne pollen concentrations and daily cardiovascular and respiratory-disease mortality [J].
Brunekreef, B ;
Hoek, G ;
Fischer, P ;
Spieksma, FTM .
LANCET, 2000, 355 (9214) :1517-1518
[9]   Statistical mapping of tree species over Europe [J].
Brus, D. J. ;
Hengeveld, G. M. ;
Walvoort, D. J. J. ;
Goedhart, P. W. ;
Heidema, A. H. ;
Nabuurs, G. J. ;
Gunia, K. .
EUROPEAN JOURNAL OF FOREST RESEARCH, 2012, 131 (01) :145-157
[10]   A review of artificial neural network models for ambient air pollution prediction [J].
Cabaneros, Sheen Mclean ;
Calautit, John Kaiser ;
Hughes, Ben Richard .
ENVIRONMENTAL MODELLING & SOFTWARE, 2019, 119 :285-304