Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches

被引:70
作者
Singh, Kunwar P. [1 ,2 ]
Gupta, Shikha [1 ,2 ]
Mohan, Dinesh [3 ]
机构
[1] Acad Sci & Innovat Res, New Delhi 110001, India
[2] CSIR, Div Environm Chem, CSIR Indian Inst Toxicol Res, Lucknow 226001, Uttar Pradesh, India
[3] Jawaharlal Nehru Univ, Sch Environm Sci, New Delhi 110067, India
关键词
Ensemble learning; Decision tree forest; Decision treeboost; Groundwater hydrochemistry; Seasonal variations; Anthropogenic activity; SUPPORT VECTOR MACHINE; PREDICTION ACCURACY; AQUIFER; PERFORMANCE; SELECTION; BEHAVIOR; COMPLEX; INDIA; PLAIN;
D O I
10.1016/j.jhydrol.2014.01.004
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Chemical composition and hydrochemistry of groundwater is influenced by the seasonal variations and anthropogenic activities in a region. Understanding of such influences and responsible factors is vital for the effective management of groundwater. In this study, ensemble learning based classification and regression models are constructed and applied to the groundwater hydrochemistry data of Unnao and Ghaziabad regions of northern India. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) models were constructed. Predictive and generalization abilities of the proposed models were investigated using several statistical parameters and compared with the support vector machines (SVM) method. The DT and SVM models discriminated the groundwater in shallow and deep aquifers, industrial and non-industrial areas, and pre- and post-monsoon seasons rendering misclassification rate (MR) between 1.52-14.92% (SDT); 0.91-6.52% (DTF); 0.61-5.27% (DTB), and 1.52-11.69% (SVM), respectively. The respective regression models yielded a correlation between measured and predicted values of COD and root mean squared error of 0.874, 0.66 (SDT); 0.952, 0.48 (DTF); 0.943, 0.52 (DTB); and 0.785, 0.85 (SVR) in complete data array of Ghaziabad. The DTF and DTB models outperformed the SVM both in classification and regression. It may be noted that incorporation of the bagging and stochastic gradient boosting algorithms in DTF and DTB models, respectively resulted in their enhanced predictive ability. The proposed ensemble models successfully delineated the influences of seasonal variations and anthropogenic activities on groundwater hydrochemistry and can be used as effective tools for forecasting the chemical composition of groundwater for its management. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:254 / 266
页数:13
相关论文
共 48 条
[1]   Generalisation for neural networks through data sampling and training procedures, with applications to streamflow predictions [J].
Anctil, F ;
Lauzon, N .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2004, 8 (05) :940-958
[2]  
[Anonymous], 2006, STANDARD METHODS EXA, DOI DOI 10.5860/CHOICE.37-2792
[3]   Linear and nonlinear modeling for simultaneous prediction of dissolved oxygen and biochemical oxygen demand of the surface water - A case study [J].
Basant, Nikita ;
Gupta, Shikha ;
Malik, Amrita ;
Singh, Kunwar P. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 104 (02) :172-180
[4]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[5]   Heavy metals assessment in urban soil around industrial clusters in Ghaziabad, India: Probabilistic health risk approach [J].
Chabukdhara, Mayuri ;
Nema, Arvind K. .
ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2013, 87 :57-64
[6]   In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods [J].
Cheng, Feixiong ;
Shen, Jie ;
Yu, Yue ;
Li, Weihua ;
Liu, Guixia ;
Lee, Philip W. ;
Tang, Yun .
CHEMOSPHERE, 2011, 82 (11) :1636-1643
[7]  
Chopra T., 2011, INT J SOFT COMPUT EN, V1, P98
[8]   Optimizing the Prediction Accuracy of Concrete Compressive Strength Based on a Comparison of Data-Mining Techniques [J].
Chou, Jui-Sheng ;
Chiu, Chien-Kuo ;
Farfoura, Mahmoud ;
Al-Taharwa, Ismail .
JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2011, 25 (03) :242-253
[9]   Modeling the occurrence of 15 coniferous tree species throughout the Pacific Northwest of North America using a hybrid approach of a generic process-based growth model and decision tree analysis [J].
Coops, Nicholas C. ;
Waring, Richard H. ;
Beier, Clayton ;
Roy-Jauvin, Raphael ;
Wang, Tongli .
APPLIED VEGETATION SCIENCE, 2011, 14 (03) :402-414
[10]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157