Data Integration for ML-CNPM2.5: A Public Sample Dataset Based on Machine Learning Models and Remote Sensing Technology Applied for Estimating Ground-Level PM2.5 in China

被引:2
|
作者
Fan, Yulong [1 ]
Sun, Lin [1 ]
Liu, Xirong [1 ]
机构
[1] Shandong Univ Sci & Technol, Coll Geodesy & Geomat, Qingdao 266590, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金;
关键词
Aerosol optical depth (AOD); machine learning (ML); particulate matter (PM2.5); sample dataset; satellite remote sensing; BURDEN;
D O I
10.1109/TGRS.2024.3436006
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Ambient fine particulate matter (PM2.5) has significant adverse effects on human health, thereby urgent hunger for accurate monitoring of ground-level PM2.5, especially its space distribution. Since satellites can observe the Earth on a large spatial scale, remote sensing technology can be applied to estimate PM2.5 concentrations at the national level. Based on it and machine learning (ML) methods, numerous studies mapped high-accuracy, wholesale and continuous PM2.5. However, different models and data in these studies made their results incomparable, and more samples were needed to be provided. Here, a large-column and long-term sample dataset (ML-CNPM2.5) applied for ML-based models was constructed with 5076608 data records and 24 features from 2014 to 2023 in China. Multiple approaches were used to guarantee the quantity and quality of the sample dataset. Due to its comprehensiveness and objectivity, the ML-CNPM2.5 can be used to train and validate different models, thereby further improving the accuracy of PM2.5 estimating. Using the ML-CNPM2.5, eight basic ML-based models were also constructed as the baseline for judging other derivative models. These models can estimate daily full-coverage PM2.5 and most performed well, with ten-fold cross-validation (10 CV) RMSE of 16.94- 11.21 mu g/m(3) and R-2 of 0.71-0.89, which is consistent with previous studies and can effectively capture spatial trends of PM2.5 in a period suffered from high pollution. Overall, our ML-CNPM2.5 can be applied to effectively construct, validate, and compare various ML-based models for PM2.5 estimation, helping to develop new algorithms with higher accuracy and robustness.
引用
收藏
页数:15
相关论文
共 6 条
  • [1] DEEP LEARNING FOR GROUND-LEVEL PM2.5 PREDICTION FROM SATELLITE REMOTE SENSING DATA
    Li, Tongwen
    Shen, Huanfeng
    Yuan, Qiangqiang
    Zhang, Liangpei
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 7581 - 7584
  • [2] Deep Learning Architecture for Estimating Hourly Ground-Level PM2.5 Using Satellite Remote Sensing
    Sun, Yibo
    Zeng, Qiaolin
    Geng, Bing
    Lin, Xingwen
    Sude, Bilige
    Chen, Liangfu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (09) : 1343 - 1347
  • [3] Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5
    Lin, Changqing
    Li, Ying
    Yuan, Zibing
    Lau, Alexis K. H.
    Li, Chengcai
    Fung, Jimmy C. H.
    REMOTE SENSING OF ENVIRONMENT, 2015, 156 : 117 - 128
  • [4] Estimating Ground-Level PM2.5 Using Fine-Resolution Satellite Data in the Megacity of Beijing, China
    Li, Rong
    Gong, Jianhua
    Chen, Liangfu
    Wang, Zifeng
    AEROSOL AND AIR QUALITY RESEARCH, 2015, 15 (04) : 1347 - 1356
  • [5] Application of satellite remote sensing data and random forest approach to estimate ground-level PM2.5 concentration in Northern region of Thailand
    Pimchanok Wongnakae
    Pakkapong Chitchum
    Rungduen Sripramong
    Arthit Phosri
    Environmental Science and Pollution Research, 2023, 30 : 88905 - 88917
  • [6] Application of satellite remote sensing data and random forest approach to estimate ground-level PM2.5 concentration in Northern region of Thailand
    Wongnakae, Pimchanok
    Chitchum, Pakkapong
    Sripramong, Rungduen
    Phosri, Arthit
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (38) : 88905 - 88917