Accounting for Training Data Error in Machine Learning Applied to Earth Observations

被引:62
作者
Elmes, Arthur [1 ,2 ]
Alemohammad, Hamed [3 ]
Avery, Ryan [4 ]
Caylor, Kelly [4 ,5 ]
Eastman, J. Ronald [1 ]
Fishgold, Lewis [6 ]
Friedl, Mark A. [7 ]
Jain, Meha [8 ]
Kohli, Divyani [9 ]
Bayas, Juan Carlos Laso [10 ]
Lunga, Dalton [11 ]
McCarty, Jessica L. [12 ,13 ]
Pontius, Robert Gilmore, Jr. [1 ]
Reinmann, Andrew B. [14 ,15 ]
Rogan, John [1 ]
Song, Lei [1 ]
Stoynova, Hristiana [14 ,15 ]
Ye, Su [1 ]
Yi, Zhuang-Fang [16 ]
Estes, Lyndon [1 ]
机构
[1] Clark Univ, Grad Sch Geog, Worcester, MA 01610 USA
[2] Univ Massachusetts, Sch Environm, Boston, MA 02125 USA
[3] Radiant Earth Fdn, San Francisco, CA 94105 USA
[4] Univ Calif Santa Barbara, Dept Geog, Santa Barbara, CA 93013 USA
[5] Univ Calif Santa Barbara, Bren Sch Environm Sci & Management, Santa Barbara, CA 93013 USA
[6] Azavea Inc, Philadelphia, PA 19123 USA
[7] Boston Univ, Dept Earth & Environm, Boston, MA 02215 USA
[8] Univ Michigan, Sch Environm & Sustainabil, Ann Arbor, MI 48109 USA
[9] Univ Twente, Fac Geoinformat Sci & Earth Observat ITC, NL-7514 AE Enschede, Netherlands
[10] IIASA, Ecosyst Serv & Management Program, Ctr Earth Observat & Citizen Sci, A-2361 Laxenburg, Austria
[11] Oak Ridge Natl Lab, Natl Secur Emerging Technol, Oak Ridge, TN 37831 USA
[12] Miami Univ, Dept Geog, Oxford, OH 45056 USA
[13] Miami Univ, Geospatial Anal Ctr, Oxford, OH 45056 USA
[14] CUNY, Adv Sci Res Ctr, Environm Sci Initiat, New York, NY 10065 USA
[15] Hunter Coll, Dept Geog & Environm Sci, New York, NY 10065 USA
[16] Dev Seed, Washington, DC 20001 USA
基金
美国国家科学基金会;
关键词
training data; machine learning; map accuracy; error propagation; LAND-COVER CLASSIFICATION; SUPPORT VECTOR MACHINES; GROUND REFERENCE DATA; ACCURACY ASSESSMENT; TIME-SERIES; NEURAL-NETWORK; LARGE-AREA; IMAGE INTERPRETATION; SPATIAL-RESOLUTION; INTENSITY ANALYSIS;
D O I
10.3390/rs12061034
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote sensing, or Earth Observation (EO), is increasingly used to understand Earth system dynamics and create continuous and categorical maps of biophysical properties and land cover, especially based on recent advances in machine learning (ML). ML models typically require large, spatially explicit training datasets to make accurate predictions. Training data (TD) are typically generated by digitizing polygons on high spatial-resolution imagery, by collecting in situ data, or by using pre-existing datasets. TD are often assumed to accurately represent the truth, but in practice almost always have error, stemming from (1) sample design, and (2) sample collection errors. The latter is particularly relevant for image-interpreted TD, an increasingly commonly used method due to its practicality and the increasing training sample size requirements of modern ML algorithms. TD errors can cause substantial errors in the maps created using ML algorithms, which may impact map use and interpretation. Despite these potential errors and their real-world consequences for map-based decisions, TD error is often not accounted for or reported in EO research. Here we review the current practices for collecting and handling TD. We identify the sources of TD error, and illustrate their impacts using several case studies representing different EO applications (infrastructure mapping, global surface flux estimates, and agricultural monitoring), and provide guidelines for minimizing and accounting for TD errors. To harmonize terminology, we distinguish TD from three other classes of data that should be used to create and assess ML models: training reference data, used to assess the quality of TD during data generation; validation data, used to iteratively improve models; and map reference data, used only for final accuracy assessment. We focus primarily on TD, but our advice is generally applicable to all four classes, and we ground our review in established best practices for map accuracy assessment literature. EO researchers should start by determining the tolerable levels of map error and appropriate error metrics. Next, TD error should be minimized during sample design by choosing a representative spatio-temporal collection strategy, by using spatially and temporally relevant imagery and ancillary data sources during TD creation, and by selecting a set of legend definitions supported by the data. Furthermore, TD error can be minimized during the collection of individual samples by using consensus-based collection strategies, by directly comparing interpreted training observations against expert-generated training reference data to derive TD error metrics, and by providing image interpreters with thorough application-specific training. We strongly advise that TD error is incorporated in model outputs, either directly in bias and variance estimates or, at a minimum, by documenting the sources and implications of error. TD should be fully documented and made available via an open TD repository, allowing others to replicate and assess its use. To guide researchers in this process, we propose three tiers of TD error accounting standards. Finally, we advise researchers to clearly communicate the magnitude and impacts of TD error on map outputs, with specific consideration given to the likely map audience.
引用
收藏
页数:39
相关论文
共 228 条
  • [1] Abbas I. I., 2010, Research Journal of Environmental and Earth Sciences, V2, P6
  • [2] Assessing geometric accuracy of the orthorectification process from GeoEye-1 and WorldView-2 panchromatic images
    Aguilar, Manuel A.
    Saldana, Maria del Mar
    Aguilar, Fernando J.
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2013, 21 : 427 - 435
  • [3] Incorporating land use mapping and participation in Jordan - An approach to sustainable management of two mountainous areas
    Al-Bakri, Jawad Taleb
    Ajlouni, Mohammad
    Abu-Zanat, Mahfouz
    [J]. MOUNTAIN RESEARCH AND DEVELOPMENT, 2008, 28 (01) : 49 - 57
  • [4] Intensity analysis to unify measurements of size and stationarity of land changes by interval, category, and transition
    Aldwaik, Safaa Zakaria
    Pontius, Robert Gilmore, Jr.
    [J]. LANDSCAPE AND URBAN PLANNING, 2012, 106 (01) : 103 - 114
  • [5] Water, Energy, and Carbon with Artificial Neural Networks (WECANN): a statistically based estimate of global surface turbulent fluxes and gross primary productivity using solar-induced fluorescence
    Alemohammad, Seyed Hamed
    Fang, Bin
    Konings, Alexandra G.
    Aires, Filipe
    Green, Julia K.
    Kolassa, Jana
    Miralles, Diego
    Prigent, Catherine
    Gentine, Pierre
    [J]. BIOGEOSCIENCES, 2017, 14 (18) : 4101 - 4124
  • [6] Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)
    Allouche, Omri
    Tsoar, Asaf
    Kadmon, Ronen
    [J]. JOURNAL OF APPLIED ECOLOGY, 2006, 43 (06) : 1223 - 1232
  • [7] The dimensions of global urban expansion: Estimates and projections for all countries, 2000-2050
    Angel, Shlomo
    Parent, Jason
    Civco, Daniel L.
    Blei, Alexander
    Potere, David
    [J]. PROGRESS IN PLANNING, 2011, 75 : 53 - 107
  • [8] [Anonymous], STAT METHODS SPATIAL
  • [9] [Anonymous], DAT
  • [10] [Anonymous], 1976, LAND USE LAND COVER