共 100 条
An optimization framework with dimensionality reduction using Markov Chain Monte Carlo and genetic algorithms for groundwater potential assessment
被引:0
作者:
Wang, Zitao
[1
,2
,3
]
Yue, Chao
[1
,2
,3
]
Wang, Jianping
[1
,2
]
机构:
[1] Chinese Acad Sci, Qinghai Inst Salt Lakes, Key Lab Comprehens & Highly Efficient Utilizat Sal, Xining 810008, Peoples R China
[2] Qinghai Prov Key Lab Geol & Environm Salt Lakes, Xining 810008, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Groundwater potential assessment;
Dimensionality reduction;
Genetic algorithm;
MCMC;
Automated machine learning;
JIANGHAN PLAIN;
LOGISTIC-REGRESSION;
RANDOM FOREST;
GIS;
VARIABILITY;
WEIGHTS;
MACHINE;
MODELS;
D O I:
10.1016/j.asoc.2024.111991
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Limited samples and high-dimensional feature spaces often hinder the accuracy of machine learning (ML) models in regional groundwater potential assessment (GPA). This study proposes a novel framework, the GPA with Dimensionality Optimization (GPADO), that optimizes feature dimension reduction to enhance prediction performance. Taking the Jianghan Basin as an example, data on nine continuous variables and five categorical variables influencing the region's GPA were gathered, expanding the feature set to 37 through One-hot encoding for categorical variables. Three scenarios were devised to assess prediction outcomes following various dimensionality reduction approaches. Comparative analysis revealed that a hybrid dimension reduction method, incorporating both continuous and categorical variables, yielded the highest validation set accuracy. Consequently, genetic algorithm and Markov Chain Monte Carlo methods were employed to determine the optimal solution and uncertainties associated with four unknown parameters: the chosen dimension reduction method for continuous and categorical variables, and the number of dimensions retained. Results indicated that utilizing singular value decomposition to reduce categorical variables to three dimensions, coupled with principal component analysis reducing continuous variables to three dimensions, produced the highest model validation accuracy of 0.834 within the GPADO framework. This optimal configuration facilitated automated ML training, resulting in a final validation set accuracy of 0.851 and a test set accuracy of 0.836. The resulting model provided a more precise spatial distribution of groundwater potential and demonstrated the GPADO framework's effectiveness in improving GPA accuracy, particularly in data-scarce regions. The GPADO framework offers a valuable approach for enhancing GPA studies.
引用
收藏
页数:15
相关论文