Quantifying uncertainty in land cover mappings: An adaptive approach to sampling reference data using Bayesian inference

被引:0
作者
Phillipson, Jordan [1 ]
Blair, Gordon [1 ]
Henrys, Peter [2 ]
机构
[1] Univ Lancaster, Sch Comp & Commun, Lancaster, England
[2] Lancaster Off, UK Ctr Ecol & Hydrol, Lancaster, England
来源
ENVIRONMENTAL DATA SCIENCE | 2022年 / 1卷
基金
英国工程与自然科学研究理事会;
关键词
Bayesian; land cover maps; reference sampling; sample design; uncertainty; REGRESSION; MIXTURES; PRIORS;
D O I
10.1017/eds.2022.14
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Mappings play an important role in environmental science applications by allowing practitioners to monitor changes at national and global scales. Over the last decade, it has become increasingly popular to use satellite imagery data and machine learning techniques (MLTs) to construct such maps. Given the black-box nature of many of these MLTs though, quantifying uncertainty in these maps often relies on sampling reference data under stricter conditions. However, practical constraints can sampling such data expensive, which forces stakeholders to make a trade-off between the degree of uncertainty in predictions and the costs of collecting appropriately sampled reference data. Furthermore, quantifying any trade-off is often difficult, as it will depend on many interdependent factors that cannot be fully understood until more data is collected. This paper investigates how a combination of Bayesian inference and an adaptive approach to sampling reference data can offer a generalizable way of managing such trade-offs. The approach is illustrated and evaluated using a woodland mapping of England as a case study in which reference data is collected under constraints motivated by COVID-19 travel restrictions. The key findings of this paper are as follows: (a) an adaptive approach to sampling reference data allows an informed approach when quantifying this trade-off; and (b) Bayesian inference is naturally suited to adaptive sampling and can make use of Monte Carlo methods when dealing with more advanced problems and analytical techniques. Impact Statement As practitioners look toward more automated procedures of generating maps with machine learning techniques (MLTs), many uncertainty quantification methods rely on a separate set of reference data from well-structured sample designs which can be expensive due to accessibility issues. This work provides a substantial step toward the goal of using adaptive sampling to effectively manage the balance between costs and uncertainty when sampling reference data under design constraints. Whilst this work focuses on the domain of land cover mappings but many of the results here easily transfer to other applications involving uncertainty quantification in MLTs as the framework is agnostic to the choice of MLT, the model used to quantify uncertainty and propensity scoring used in targeted sampling.
引用
收藏
页数:25
相关论文
共 60 条
[1]   Forest aboveground biomass estimation using machine learning regression algorithm in Yok Don National Park, Vietnam [J].
An Thi Ngoc Dang ;
Nandy, Subrata ;
Srinet, Ritika ;
Nguyen Viet Luong ;
Ghosh, Surajit ;
Kumar, A. Senthil .
ECOLOGICAL INFORMATICS, 2019, 50 :24-32
[2]   Conditional independence in sample selection models [J].
Angrist, JD .
ECONOMICS LETTERS, 1997, 54 (02) :103-112
[3]   Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures [J].
Bobb, Jennifer F. ;
Valeri, Linda ;
Claus Henn, Birgit ;
Christiani, David C. ;
Wright, Robert O. ;
Mazumdar, Maitreyi ;
Godleski, John J. ;
Coull, Brent A. .
BIOSTATISTICS, 2015, 16 (03) :493-508
[4]  
Bobb JF., 2017, bkmr: Bayesian Kernel Machine Regression. R package version 0.2.0
[5]  
[6]   Adaptive design clinical trials: a review of the literature and ClinicalTrials.gov [J].
Bothwell, Laura E. ;
Avorn, Jerry ;
Khan, Nazleen F. ;
Kesselheim, Aaron S. .
BMJ OPEN, 2018, 8 (02)
[7]  
Box G E P., 1992, Bayesian inference in statistical analysis: Box/Bayesian, DOI DOI 10.1002/9781118033197.CH4
[8]  
Brown MJ, 2016, EIDC, DOI 10.5285/BF189C57-61EB-4339-A7B3-D2E81FDDE28D
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   Developing an integrated cloud-based spatial-temporal system for monitoring phenology [J].
Cope, M. ;
Mikhailova, E. ;
Post, C. ;
Schlautman, M. ;
McMillan, P. .
ECOLOGICAL INFORMATICS, 2017, 39 :123-129