Data augmentation for bias correction in mapping PM2.5 based on satellite retrievals and ground observations

被引:5
作者
Mi, Tan [1 ,2 ]
Tang, Die [2 ]
Fu, Jianbo [2 ]
Zeng, Wen [3 ]
Grieneisen, Michael L. [4 ]
Zhou, Zihang [5 ]
Jia, Fengju [6 ]
Yang, Fumo [1 ]
Zhan, Yu [1 ,2 ]
机构
[1] Sichuan Univ, Coll Carbon Neutral Future Technol, Chengdu 610065, Sichuan, Peoples R China
[2] Sichuan Univ, Dept Environm Sci & Engn, Chengdu 610065, Sichuan, Peoples R China
[3] Sichuan Univ, Inst Disaster Management & Reconstruct, Chengdu 610200, Sichuan, Peoples R China
[4] Univ Calif Davis, Dept Land Air & Water Resources, Davis, CA 95616 USA
[5] Chengdu Acad Environm Sci, Chengdu 610072, Sichuan, Peoples R China
[6] Sichuan Chengdu Ecol & Environm Monitoring Ctr, Chengdu 610011, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Aerosol optical depth; Dataset shift; Spatiotemporal Distribution; Air quality monitoring; Multiple imputation by chained equations; AEROSOL OPTICAL DEPTH; CHINA; POLLUTION; NETWORKS; TRENDS; SHIFT;
D O I
10.1016/j.gsf.2023.101686
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
As most air quality monitoring sites are in urban areas worldwide, machine learning models may produce substantial estimation bias in rural areas when deriving spatiotemporal distributions of air pollutants. The bias stems from the issue of dataset shift, as the density distributions of predictor variables differ greatly between urban and rural areas. We propose a data-augmentation approach based on the multiple imputation by chained equations (MICE-DA) to remedy the dataset shift problem. Compared with the benchmark models, MICE-DA exhibits superior predictive performance in deriving the spatiotemporal distributions of hourly PM2.5 in the megacity (Chengdu) at the foot of the Tibetan Plateau, especially for correcting the estimation bias, with the mean bias decreasing from -3.4 lg/m3 to -1.6 lg/m3. As a complement to the holdout validation, the semi-variance results show that MICE-DA decently preserves the spatial autocorrelation pattern of PM2.5 over the study area. The essence of MICE-DA is strengthening the correlation between PM2.5 and aerosol optical depth (AOD) during the data augmentation. Consequently, the importance of AOD is largely enhanced for predicting PM2.5, and the summed relative importance value of the two satellite-retrieved AOD variables increases from 5.5% to 18.4%. This study resolved the puzzle that AOD exhibited relatively lower importance in local or regional studies. The results of this study can advance the utilization of satellite remote sensing in modeling air quality while drawing more attention to the common dataset shift problem in data-driven environmental research. & COPY; 2023 China University of Geosciences (Beijing) and Peking University. Published by Elsevier B.V. on behalf of China University of Geosciences (Beijing). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:12
相关论文
共 75 条
  • [1] A novel ensemble-based statistical approach to estimate daily wildfire-specific PM2.5 in California (2006-2020)
    Aguilera, Rosana
    Luo, Nana
    Basu, Rupa
    Wu, Jun
    Clemesha, Rachel
    Gershunov, Alexander
    Benmarhnia, Tarik
    [J]. ENVIRONMENT INTERNATIONAL, 2023, 171
  • [2] Comparison of Satellite- based PM2.5 Estimation from Aerosol Optical Depth and Top-of-atmosphere Reflectance
    Bai, Heming
    Zheng, Zhi
    Zhang, Yuanpeng
    Huang, He
    Wang, Li
    [J]. AEROSOL AND AIR QUALITY RESEARCH, 2021, 21 (02) : 1 - 17
  • [3] A comparative analysis of gradient boosting algorithms
    Bentejac, Candice
    Csorgo, Anna
    Martinez-Munoz, Gonzalo
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1937 - 1967
  • [4] Center for International Earth Science Information Network - CIESIN - Columbia University, 2018, NASA SEDAC
  • [5] The comparison of AOD-based and non-AOD prediction models for daily PM2.5 estimation in Guangdong province, China with poor AOD coverage
    Chen, Gongbo
    Li, Yingxin
    Zhou, Yun
    Shi, Chunxiang
    Guo, Yuming
    Liu, Yuewei
    [J]. ENVIRONMENTAL RESEARCH, 2021, 195
  • [6] Stacking machine learning model for estimating hourly PM2.5 in China based on Himawari 8 aerosol optical depth data
    Chen, Jiangping
    Yin, Jianhua
    Zang, Lin
    Zhang, Taixin
    Zhao, Mengdi
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 697
  • [7] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [8] Combining low-cost, surface-based aerosol monitors with size-resolved satellite data for air quality applications
    deSouza, Priyanka
    Kahn, Ralph A.
    Limbacher, James A.
    Marais, Eloise A.
    Duarte, Fabio
    Ratti, Carlo
    [J]. ATMOSPHERIC MEASUREMENT TECHNIQUES, 2020, 13 (10) : 5319 - 5334
  • [9] Didan K., 2015, "MOD13Q1 MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006.", DOI [DOI 10.5067/MODIS/MOD13Q1.006, 10.5067/MODIS/MOD13Q1.006]
  • [10] Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning
    Douzas, Georgios
    Bacao, Fernando
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 82 : 40 - 52