Predicting river water quality index using data mining techniques

被引:0
作者
Richa Babbar
Sakshi Babbar
机构
[1] Thapar University,Department of Civil Engineering
[2] GD Goenka University,School of Engineering
来源
Environmental Earth Sciences | 2017年 / 76卷
关键词
Water quality parameters; Water quality index; Overall Index of Pollution; -fold cross-validation; Data mining classifiers;
D O I
暂无
中图分类号
学科分类号
摘要
This paper demonstrates the application of data mining techniques to predict river water quality index. The usefulness of these techniques lies in the automated extraction of novel knowledge from the data to improve decision-making. The popular classification techniques, namely k-nearest neighbor, decision trees, Naive Bayes, artificial neural networks, rule-based and support vector machines were used to develop the predictive environment to classify water quality into understandable terms based on the Overall Index of Pollution. Experimentation was conducted on two types of data sets: synthetic and real. A repeated k-fold cross-validation procedure was followed to design the learning and testing frameworks of the predictive environment. Based on the validation results, it was found that the error rate in defining the true water quality class was 20 and 28%, 11 and 24%, 1 and 38% and 10 and 20% for the k-nearest neighbor, Naive Bayes, artificial neural network and rule-based classifiers for synthetic and real data sets, respectively. The decision tree and support vector machines classifiers were found to be the best predictive models with 0% error rates during automated extraction of the water quality class. This study reveals that data mining techniques have the potential to quickly predict water quality class, provided data given are a true representation of the domain knowledge.
引用
收藏
相关论文
共 102 条
  • [1] Akkoyunlu A(2012)Pollution evaluation in streams using water quality indices: a case study from Turkey’s Sapanca Lake Basin Ecol Ind 18 501-511
  • [2] Akiner ME(2006)A water quality index applied to an international shared River Basin: the case of the Douro River Environ Manag 38 910-920
  • [3] Bordalo AA(2003)Water reservoir control with data mining J Water Res Pl ASCE 129 26-34
  • [4] Teixeira R(2010)Water quality indicators: comparison of a probabilistic index and a general quality index. the case of the Confederacion Hidrografica del Jucar (Spain) Ecol Ind 10 1049-1054
  • [5] Wiebe WJ(2001)Oregon water quality index a tool for evaluating water quality management effectiveness J Am Water Resour Assoc 37 125-137
  • [6] Bressler FT(2012)Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors Mar Pollut Bull 64 2409-2420
  • [7] Savic DA(2010)Knowledge discovery with clustering based on rules by states: a water treatment application Environ Modell Softw 26 712-723
  • [8] Walters GA(2013)Development of pollution indices for the middle section of the Lower Seyhan Basin (Turkey) Ecol Ind 29 6-17
  • [9] Cordoba EB(2005)A look at aerosol formation using data mining techniques Atmos Chem Phys 5 3345-3356
  • [10] Martinez AC(2014)Classification into homogeneous groups using combined cluster and discriminant analysis Environ Modell & Softw 57 52-59