Application of classification methods to analyze chemicals in drinking water quality

被引:4
作者
Azam, Muhammad [1 ]
Arshad, Asma [2 ]
Aslam, Muhammad [3 ]
Gulzar, Sadia [4 ]
机构
[1] Univ Vet & Anim Sci, Dept Stat & Comp Sci, Lahore, Pakistan
[2] Natl Coll Business Adm & Econ, Lahore, Pakistan
[3] King Abdulaziz Univ, Fac Sci, Dept Stat, Jeddah 21551, Saudi Arabia
[4] Kinnaird Coll Women, Dept Stat, Lahore, Pakistan
关键词
Classification trees; CHAID; ECHAID; CRT; QUEST; Cross-validation; TREE;
D O I
10.1007/s00769-018-01369-1
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
To analyze drinking water dataset, various statistical methods have been applied, including discriminant analysis, logistic regression and cluster analysis, to construct models for the identification of important input variables. Among them decision trees are more flexible than other statistical classification methods because it provides us a complete path or frame to reach a specific decision with simplicity and ease of understanding about critical variables. This article describes the application of classification decision trees for the analysis of drinking water quality affecting variables and includes discussion about these based on various methods as well as their comparison to reach the best approach for the further analysis about understudy area. In this study, samples of filtered water are taken from 100 pumps located in different union councils of the Lahore city. The classification trees are constructed on the basis of input quality variables, and the results are reported in the form of confusion matrix. Four techniques, including Chi-square Automatic Interaction Detector, Exhaustive Chi-square Automatic Interaction Detector, Classification and Regression Tree and Quick Unbiased Efficient Statistical Tree, were used. Three experiments were conducted to get performance evaluation of the models by the number of misclassified units. The first method used complete dataset, the second one is based on the cross-validation, while the last one is based on the random subsampling.
引用
收藏
页码:227 / 235
页数:9
相关论文
共 17 条
[1]  
Archer KJ, 2010, J STAT SOFTW, V34, P1
[2]   Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets [J].
Astel, A. ;
Tsakouski, S. ;
Barbieri, P. ;
Simeonov, V. .
WATER RESEARCH, 2007, 41 (19) :4566-4578
[3]  
Azam M, 2007, P 9 ISL COUNTR C STA
[4]   Comparisons of decision tree methods using water data [J].
Azam, Muhammad ;
Aslam, Muhammad ;
Khan, Khushnoor ;
Mughal, Anwar ;
Inayat, Awais .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (04) :2924-2934
[5]   Comparison of the mineral content of tap water and bottled waters [J].
Azoulay A. ;
Garzon P. ;
Eisenberg M.J. .
Journal of General Internal Medicine, 2001, 16 (3) :168-175
[6]   Evaluating the environmental impact of various dietary patterns combined with different food production systems [J].
Baroni, L. ;
Cenci, L. ;
Tettamanti, M. ;
Berati, M. .
EUROPEAN JOURNAL OF CLINICAL NUTRITION, 2007, 61 (02) :279-286
[7]  
Breiman L., 1984, BIOMETRICS, V1st ed.
[8]   Classification of bathing water quality based on the parametric calculation of percentiles is unsound [J].
Chawla, R ;
Hunter, PR .
WATER RESEARCH, 2005, 39 (18) :4552-4558
[9]  
Chin-Sheng Huang, 2008, WSEAS Transactions on Computers, V7, P1679
[10]   A comparison of water quality indices for coastal water [J].
Gupta, AK ;
Gupta, SK ;
Patil, RS .
JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH PART A-TOXIC/HAZARDOUS SUBSTANCES & ENVIRONMENTAL ENGINEERING, 2003, 38 (11) :2711-2725