LASSO (L1) Regularization for Development of Sparse Remote-Sensing Models with Applications in Optically Complex Waters Using GEE Tools

被引:6
作者
Cardall, Anna Catherine [1 ]
Hales, Riley Chad [2 ]
Tanner, Kaylee Brooke [2 ]
Williams, Gustavious Paul [2 ]
Markert, Kel N. [3 ]
机构
[1] Brigham Young Univ, Dept Chem Engn, Provo, UT 84602 USA
[2] Brigham Young Univ, Dept Civil & Construct Engn, Provo, UT 84602 USA
[3] Google LLC, Mountain View, CA 94043 USA
基金
美国国家航空航天局;
关键词
remote sensing; water quality; model development; linear regression; LASSO regularization; L1; coincident data; Google Earth Engine; CHLOROPHYLL-A; LANDSAT; INLAND; RESERVOIR; CLARITY; BLOOMS;
D O I
10.3390/rs15061670
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote-sensing data are used extensively to monitor water quality parameters such as clarity, temperature, and chlorophyll-a (chl-a) content. This is generally achieved by collecting in situ data coincident with satellite data collections and then creating empirical water quality models using approaches such as multi-linear regression or step-wise linear regression. These approaches, which require modelers to select model parameters, may not be well suited for optically complex waters, where interference from suspended solids, dissolved organic matter, or other constituents may act as "confusers". For these waters, it may be useful to include non-standard terms, which might not be considered when using traditional methods. Recent machine-learning work has demonstrated an ability to explore large feature spaces and generate accurate empirical models that do not require parameter selection. However, these methods, because of the large number of included terms involved, result in models that are not explainable and cannot be analyzed. We explore the use of Least Absolute Shrinkage and Select Operator (LASSO), or L1, regularization to fit linear regression models and produce parsimonious models with limited terms to enable interpretation and explainability. We demonstrate this approach with a case study in which chl-a models are developed for Utah Lake, Utah, USA., an optically complex freshwater body, and compare the resulting model terms to model terms from the literature. We discuss trade-offs between interpretability and model performance while using L1 regularization as a tool. The resulting model terms are both similar to and distinct from those in the literature, thereby suggesting that this approach is useful for the development of models for optically complex water bodies where standard model terms may not be optimal. We investigate the effect of non-coincident data, that is, the length of time between satellite image collection and in situ sampling, on model performance. We find that, for Utah Lake (for which there are extensive data available), three days is the limit, but 12 h provides the best trade-off. This value is site-dependent, and researchers should use site-specific numbers. To document and explain our approach, we provide Colab notebooks for compiling near-coincident data pairs of remote-sensing and in situ data using Google Earth Engine (GEE) and a second notebook implementing L1 model creation using scikitlearn. The second notebook includes data-engineering routines with which to generate band ratios, logs, and other combinations. The notebooks can be easily modified to adapt them to other locations, sensors, or parameters.
引用
收藏
页数:31
相关论文
共 44 条
  • [1] Empirical and semi-analytical chlorophyll a algorithms for multi-temporal monitoring of New Zealand lakes using Landsat
    Allan, Mathew G.
    Hamilton, David P.
    Hicks, Brendan
    Brabyn, Lars
    [J]. ENVIRONMENTAL MONITORING AND ASSESSMENT, 2015, 187 (06)
  • [2] [Anonymous], 2016, LANDSAT EARTH OBSERV, DOI DOI 10.3133/FS20153081
  • [3] Tracking cyanobacteria blooms: Do different monitoring approaches tell the same story?
    Bertani, Isabella
    Steger, Cara E.
    Obenour, Daniel R.
    Fahnenstiel, Gary L.
    Bridgeman, Thomas B.
    Johengen, Thomas H.
    Sayers, Michael J.
    Shuchman, Robert A.
    Scavia, Donald
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2017, 575 : 294 - 308
  • [4] Landsat-based remote sensing of lake water quality characteristics, including chlorophyll and colored dissolved organic matter (CDOM)
    Brezonik, P
    Menken, KD
    Bauer, M
    [J]. LAKE AND RESERVOIR MANAGEMENT, 2005, 21 (04) : 373 - 382
  • [5] Determination of chlorophyll concentration changes in Lake Garda using an image-based radiative transfer code for Landsat TM images
    Brivio, PA
    Giardino, C
    Zilioli, E
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2001, 22 (2-3) : 487 - 502
  • [6] Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
  • [7] Buitinck L., 2013, arXiv
  • [8] A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes
    Cao, Zhigang
    Ma, Ronghua
    Duan, Hongtao
    Pahlevan, Nima
    Melack, John
    Shen, Ming
    Xue, Kun
    [J]. REMOTE SENSING OF ENVIRONMENT, 2020, 248
  • [9] Cardall A., 2021, OPEN WATER J, V7
  • [10] Semianalytic Moderate-Resolution Imaging Spectrometer algorithms for chlorophyll a and absorption with bio-optical domains based on nitrate-depletion temperatures
    Carder, KL
    Chen, FR
    Lee, ZP
    Hawes, SK
    Kamykowski, D
    [J]. JOURNAL OF GEOPHYSICAL RESEARCH-OCEANS, 1999, 104 (C3) : 5403 - 5421