An optimization framework with dimensionality reduction using Markov Chain Monte Carlo and genetic algorithms for groundwater potential assessment

被引:0
作者
Wang, Zitao [1 ,2 ,3 ]
Yue, Chao [1 ,2 ,3 ]
Wang, Jianping [1 ,2 ]
机构
[1] Chinese Acad Sci, Qinghai Inst Salt Lakes, Key Lab Comprehens & Highly Efficient Utilizat Sal, Xining 810008, Peoples R China
[2] Qinghai Prov Key Lab Geol & Environm Salt Lakes, Xining 810008, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Groundwater potential assessment; Dimensionality reduction; Genetic algorithm; MCMC; Automated machine learning; JIANGHAN PLAIN; LOGISTIC-REGRESSION; RANDOM FOREST; GIS; VARIABILITY; WEIGHTS; MACHINE; MODELS;
D O I
10.1016/j.asoc.2024.111991
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Limited samples and high-dimensional feature spaces often hinder the accuracy of machine learning (ML) models in regional groundwater potential assessment (GPA). This study proposes a novel framework, the GPA with Dimensionality Optimization (GPADO), that optimizes feature dimension reduction to enhance prediction performance. Taking the Jianghan Basin as an example, data on nine continuous variables and five categorical variables influencing the region's GPA were gathered, expanding the feature set to 37 through One-hot encoding for categorical variables. Three scenarios were devised to assess prediction outcomes following various dimensionality reduction approaches. Comparative analysis revealed that a hybrid dimension reduction method, incorporating both continuous and categorical variables, yielded the highest validation set accuracy. Consequently, genetic algorithm and Markov Chain Monte Carlo methods were employed to determine the optimal solution and uncertainties associated with four unknown parameters: the chosen dimension reduction method for continuous and categorical variables, and the number of dimensions retained. Results indicated that utilizing singular value decomposition to reduce categorical variables to three dimensions, coupled with principal component analysis reducing continuous variables to three dimensions, produced the highest model validation accuracy of 0.834 within the GPADO framework. This optimal configuration facilitated automated ML training, resulting in a final validation set accuracy of 0.851 and a test set accuracy of 0.836. The resulting model provided a more precise spatial distribution of groundwater potential and demonstrated the GPADO framework's effectiveness in improving GPA accuracy, particularly in data-scarce regions. The GPADO framework offers a valuable approach for enhancing GPA studies.
引用
收藏
页数:15
相关论文
共 100 条
  • [11] Brooks SP, 1998, J ROY STAT SOC D-STA, V47, P69, DOI 10.1111/1467-9884.00117
  • [12] Aquifer vulnerability and potential risk assessment: application to an intensely cultivated and densely populated area in Southern Italy
    Busico, Gianluigi
    Cuoco, Emilio
    Sirna, Maurizio
    Mastrocicco, Micol
    Tedesco, Dario
    [J]. ARABIAN JOURNAL OF GEOSCIENCES, 2017, 10 (10)
  • [13] Assessment of groundwater potential zone for sustainable water resource management in south-western part of Birbhum District, West Bengal
    Chatterjee, Soumen
    Dutta, Shyamal
    [J]. APPLIED WATER SCIENCE, 2022, 12 (03)
  • [14] SF-FWA: A Self-Adaptive Fast Fireworks Algorithm for effective large-scale optimization
    Chen, Maiyue
    Tan, Ying
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2023, 80
  • [15] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [16] Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods
    Chen, Wei
    Tsangaratos, Paraskevas
    Ilia, Ioanna
    Duan, Zhao
    Chen, Xinjian
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 684 : 31 - 49
  • [17] GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models
    Chen, Wei
    Li, Hui
    Hou, Enke
    Wang, Shengquan
    Wang, Guirong
    Panahi, Mahdi
    Li, Tao
    Peng, Tao
    Guo, Chen
    Niu, Chao
    Xiao, Lele
    Wang, Jiale
    Xie, Xiaoshen
    Bin Ahmad, Baharin
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2018, 634 : 853 - 867
  • [18] UNDERSTANDING THE METROPOLIS-HASTINGS ALGORITHM
    CHIB, S
    GREENBERG, E
    [J]. AMERICAN STATISTICIAN, 1995, 49 (04) : 327 - 335
  • [19] Choudhary S, 2023, EMERGING TECHNOLOGIE, P109
  • [20] Predictive error dependencies when using pilot points and singular value decomposition in groundwater model calibration
    Christensen, Steen
    Doherty, John
    [J]. ADVANCES IN WATER RESOURCES, 2008, 31 (04) : 674 - 700