Predicting PM2.5 levels and exceedance days using machine learning methods

被引:12
作者
Gao, Ziqi [1 ,4 ]
Do, Khanh [2 ,3 ]
Li, Zongrun [1 ]
Jiang, Xiangyu [4 ]
Maji, Kamal J. [1 ]
Ivey, Cesunica E. [2 ,5 ]
Russell, Armistead G. [1 ]
机构
[1] Georgia Inst Technol, Sch Civil & Environm Engn, Atlanta, GA 30332 USA
[2] Univ Calif Riverside, Dept Chem & Environm Engn, Riverside, CA 92521 USA
[3] Ctr Environm Res & Technol, Riverside, CA USA
[4] Georgia Environm Protect Div, Atlanta, GA 30354 USA
[5] Univ Calif Berkeley, Dept Civil & Environm Engn, Berkeley, CA 94720 USA
关键词
PM2.5; South coast air basin; Support vector machine; Random forest; Neural network; AIR-POLLUTION; CLASSIFICATION; INFORMATION; EMISSIONS; INCREASES; MORTALITY; OZONE;
D O I
10.1016/j.atmosenv.2024.120396
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning methods are increasingly being used in the field of air quality research to investigate the relationship between air pollutant levels, emissions, and meteorological changes over time. This research is used for both scientific investigation, and policy assessment and development. However, there is a lack of studies that have compared the performance of different machine learning methods. To address this gap, this paper employed various machine learning techniques, including decision tree, random forest (RF), support vector machine (SVM), support vector regression (SVR), k -nearest neighbor, neural network, and Gaussian process regression, to predict daily average PM2.5 levels and the number of days with PM2.5 exceedance in the South Coast Air Basin of California from 2000 to 2019. The models were trained using meteorological factors, estimated emissions, and large-scale climate indices as inputs. The SVR model demonstrated the highest predictive accuracy for PM2.5 levels and the SVM model gave the most accurate results for predicting the number of days with PM2.5 exceedances. Conversely, the decision tree model performed the least accurately. The results also showed that emissions have a greater impact on PM2.5 levels over time compared to meteorological factors, though meteorology is responsible for daily variability. The most important meteorological factors were identified as surface relative humidity and relative humidity at 850 mbars, which are related to partitioning, cloud cover and wet deposition. We conducted sensitivity tests on the model's response to emissions and meteorological factors. The predicted PM2.5 from RF and SVR showed large correlations with emissions at the early period (2000-2010). However, the changes were minimal in more recent years (2011-2019), implying that there are biases in machine learning models, in which the models consistently predict the minimum PM2.5 levels at a baseline.
引用
收藏
页数:9
相关论文
共 47 条
[1]  
[Anonymous], 2020, Climate Data Online
[2]  
Ba J, 2014, ACS SYM SER
[3]   Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models [J].
Bailly, Alexandre ;
Blanc, Corentin ;
Francis, Elie ;
Guillotin, Thierry ;
Jamal, Fadi ;
Wakim, Bechara ;
Roy, Pascal .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 213
[4]  
Belyaev M., 2014, arXiv
[5]   Combining Machine Learning and Numerical Simulation for High-Resolution PM2.5 Concentration Forecast [J].
Bi, Jianzhao ;
Knowland, K. Emma ;
Keller, Christoph A. ;
Liu, Yang .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2022, 56 (03) :1544-1556
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Breiman L., 2017, Classification and regression trees
[8]  
CARB, 2020, Air quality and meteorological information system (AQMIS)
[9]  
CARB, 2022, CEPAM: 2019 SIP-standard emission tool emission projections by summary category base year: 2017
[10]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)