Evaluation of the prediction effectiveness for geochemical mapping using machine learning methods: A case study from northern Guangdong Province in China

被引:3
作者
Lv, Songjian [1 ,2 ]
Zhu, Ying [1 ,2 ]
Cheng, Li [1 ,2 ]
Zhang, Jingru [1 ,2 ,3 ]
Shen, Wenjie [4 ]
Li, Xingyuan [1 ,2 ]
机构
[1] Lanzhou Univ, Coll Earth & Environm Sci, Ctr Hlth Geol, Minist Educ, Lanzhou 730000, Peoples R China
[2] Lanzhou Univ, Carbon Peak & Carbon Neutral Lanzhou Univ, Coll Earth & Environm Sci, Key Lab Western Chinas Environm Syst,Minist Educ, Lanzhou 730000, Peoples R China
[3] Guangdong Prov Acad Environm Sci, Guangzhou 510045, Peoples R China
[4] Sun Yat Sen Univ, Sch Earth Sci & Engn, Zhuhai 519000, Peoples R China
关键词
Machine learning; Heavy metals; Kriging interpolation; Accurate prediction; ARTIFICIAL NEURAL-NETWORKS; SOIL ORGANIC-CARBON; SPATIAL-DISTRIBUTION; HEAVY-METALS; REGIONAL-SCALE; GROUNDWATER; REGRESSION; RISK; HEALTH; MATTER;
D O I
10.1016/j.scitotenv.2024.172223
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
This study compares seven machine learning models to investigate whether they improve the accuracy of geochemical mapping compared to ordinary kriging (OK). Arsenic is widely present in soil due to human activities and soil parent material, posing significant toxicity. Predicting the spatial distribution of elements in soil has become a current research hotspot. Lianzhou City in northern Guangdong Province, China, was chosen as the study area, collecting a total of 2908 surface soil samples from 0 to 20 cm depth. Seven machine learning models were chosen: Random Forest (RF), Support Vector Machine (SVM), Ridge Regression (Ridge), Gradient Boosting Decision Tree (GBDT), Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), and Gaussian Process Regression (GPR). Exploring the advantages and disadvantages of machine learning and traditional geological statistical models in predicting the spatial distribution of heavy metal elements, this study also analyzes factors affecting the accuracy of element prediction. The two best-performing models in the original model, RF (R 2 = 0.445) and GBDT (R 2 = 0.414), did not outperform OK (R 2 = 0.459) in terms of prediction accuracy. Ridge and GPR, the worst-performing methods, have R 2 values of only 0.201 and 0.248, respectively. To improve the models' prediction accuracy, a spatial regionalized (SR) covariate index was added. Improvements varied among different methods, with RF and GBDT increasing their R 2 values from 0.4 to 0.78 after enhancement. In contrast, the GPR model showed the least significant improvement, with its R 2 value only reaching 0.25 in the improved method. This study concluded that choosing the right machine learning model and considering factors that influence prediction accuracy, such as regional variations, the number of sampling points, and their distribution, are crucial for ensuring the accuracy of predictions. This provides valuable insights for future research in this area.
引用
收藏
页数:10
相关论文
共 71 条
[1]   Minimally overfitted learners: A general framework for ensemble learning [J].
Acena, Victor ;
Martin de Diego, Isaac ;
Fernandez, Ruben R. ;
Moguerza, Javier M. .
KNOWLEDGE-BASED SYSTEMS, 2022, 254
[2]   Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms [J].
Arabameri, Alireza ;
Pradhan, Biswajeet ;
Pourghasemi, Hamid Reza ;
Rezaei, Khalil ;
Kerle, Norman .
APPLIED SCIENCES-BASEL, 2018, 8 (08)
[3]   Spatial and temporal mapping of groundwater salinity using ordinary kriging and indicator kriging: The case of Bafra Plain, Turkey [J].
Arslan, Hakan .
AGRICULTURAL WATER MANAGEMENT, 2012, 113 :57-63
[4]   Predicting heavy metal contents by applying machine learning approaches and environmental covariates in west of Iran [J].
Azizi, Kamran ;
Ayoubi, Shamsollah ;
Nabiollahi, Kamal ;
Garosi, Younes ;
Gislum, Rene .
JOURNAL OF GEOCHEMICAL EXPLORATION, 2022, 233
[5]   Mapping LUCAS topsoil chemical properties at European scale using Gaussian process regression [J].
Ballabio, Cristiano ;
Lugato, Emanuele ;
Fernandez-Ugalde, Oihane ;
Orgiazzi, Alberto ;
Jones, Arwyn ;
Borrelli, Pasquale ;
Montanarella, Luca ;
Panagos, Panos .
GEODERMA, 2019, 355
[6]   Spatial distribution of the groundwater quality using kriging and Co-kriging interpolations [J].
Belkhiri, Lazhar ;
Tiri, Ammar ;
Mouni, Lotfi .
GROUNDWATER FOR SUSTAINABLE DEVELOPMENT, 2020, 11
[7]   Spatial distribution of soil chemical properties in an organic farm in Croatia [J].
Bogunovic, Igor ;
Pereira, Paulo ;
Brevik, Eric C. .
SCIENCE OF THE TOTAL ENVIRONMENT, 2017, 584 :535-545
[8]   Predicting soil arsenic pools by visible near infrared diffuse reflectance spectroscopy [J].
Chakraborty, Somsubhra ;
Li, Bin ;
Deb, Shovik ;
Paul, Sathi ;
Weindorf, David C. ;
Das, Bhabani S. .
GEODERMA, 2017, 296 :30-37
[9]   Delineating and identifying risk zones of soil heavy metal pollution in an industrialized region using machine learning [J].
Chen, Di ;
Wang, Xiahui ;
Luo, Ximing ;
Huang, Guoxin ;
Tian, Zi ;
Li, Weiyu ;
Liu, Fei .
ENVIRONMENTAL POLLUTION, 2023, 318
[10]   Mapping dynamics of soil organic matter in croplands with MODIS data and machine learning algorithms [J].
Chen, Di ;
Chang, Naijie ;
Xiao, Jingfeng ;
Zhou, Qingbo ;
Wu, Wenbin .
SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 669 :844-855