Evaluating statistical model performance in water quality prediction

被引:89
作者
Avila, Rodelyn [1 ,2 ]
Horn, Beverley [2 ]
Moriarty, Elaine [2 ]
Hodson, Roger [3 ]
Moltchanova, Elena [1 ]
机构
[1] Univ Canterbury, Sch Math & Stat, Private Bag 4800, Christchurch 8140, New Zealand
[2] ESR, Inst Environm Sci & Res, POB 29181, Christchurch 8540, New Zealand
[3] Environm Southland, Private Bag 90116, Invercargill 9840, New Zealand
关键词
Water quality prediction; E; coli; Statistical models; Bayesian networks; ESCHERICHIA-COLI; SURVIVAL; HEALTH; MPN;
D O I
10.1016/j.jenvman.2017.11.049
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Exposure to contaminated water while swimming or boating or participating in other recreational activities can cause gastrointestinal and respiratory disease. It is not uncommon for water bodies to experience rapid fluctuations in water quality, and it is therefore vital to be able to predict them accurately and in time so as to minimise population's exposure to pathogenic organisms. E. coli is commonly used as an indicator to measure water quality in freshwater, and higher counts of E. coil are associated with increased risk to illness. In this case study, we compare the performance of a wide range of statistical models in prediction of water quality via E. coli levels for the weekly data collected over the summer months from 2006 to 2014 at the recreational site on the Oreti river in Wallacetown, New Zealand. The models include naive model, multiple linear regression, dynamic regression, regression tree, Markov chain, classification tree, random forests, multinomial logistic regression, discriminant analysis and Bayesian network. The results show that Bayesian network was superior to all the other models. Overall, it had a leave-one-out and k-fold cross validation error rate of 21%, while predicting the majority of instances of E. coli levels classified as unsafe by the Microbiological Water Quality Guidelines for Marine and Freshwater Recreational Areas 2003, New Zealand. Because Bayesian networks are also flexible in handling missing data and outliers and allow for continuous updating in real time, we have found them to be a promising tool, and in the future, plan to extend the analysis beyond the current case study site. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:910 / 919
页数:10
相关论文
共 59 条
[1]  
Agresti Alan, 1996, INTRO CATEGORICAL DA, V312
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
Bridle Helen, 2014, WATERBORNE PATHOGENS, V401
[4]   SURVIVAL AND ENUMERATION OF THE FECAL INDICATORS BIFIDOBACTERIUM-ADOLESCENTIS AND ESCHERICHIA-COLI IN A TROPICAL RAIN-FOREST WATERSHED [J].
CARRILLO, M ;
ESTRADA, E ;
HAZEN, TC .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 1985, 50 (02) :468-476
[5]   Water quality as a regional driver of coral biodiversity and macroalgae on the Great Barrier Reef [J].
De'ath, Glenn ;
Fabricius, Katharina .
ECOLOGICAL APPLICATIONS, 2010, 20 (03) :840-850
[6]   Bayesian Network for Risk of Diarrhea Associated with the Use of Recycled Water [J].
Donald, Margaret ;
Cook, Angus ;
Mengersen, Kerrie .
RISK ANALYSIS, 2009, 29 (12) :1672-1685
[7]  
Draper N.R., 1998, WILEY SERIES PROBABI, DOI DOI 10.1198/TECH.2005.S303
[8]   Predicting chemical parameters of river water quality from bioindicator data [J].
Dzeroski, S ;
Demsar, D ;
Grbovic, J .
APPLIED INTELLIGENCE, 2000, 13 (01) :7-17
[9]   Escherichia coli:: the best biological drinking water indicator for public health protection [J].
Edberg, SC ;
Rice, EW ;
Karlin, RJ ;
Allen, MJ .
JOURNAL OF APPLIED MICROBIOLOGY, 2000, 88 :106S-116S