Decision tree models in predicting water quality parameters of dissolved oxygen and phosphorus in lake water

被引:24
作者
Gorgan-Mohammadi, Faezeh [1 ]
Rajaee, Taher [1 ]
Zounemat-Kermani, Mohammad [2 ]
机构
[1] Univ Qom, Dept Civil Engn, Qom, Iran
[2] Shahid Bahonar Univ Kerman, Dept Water Engn, Kerman, Iran
基金
英国科研创新办公室;
关键词
Water quality; Hydrochemical parameters; Machine learning; Data mining; Decision tree;
D O I
10.1007/s40899-022-00776-0
中图分类号
TV21 [水资源调查与水利规划];
学科分类号
081501 ;
摘要
Water quality is an important issue because of its relationship to humans and other living organisms. Predicting water quality parameters is very important for better management of water resources. The decision tree is one of the data mining methods that can create rules for classifying and predicting data using a tree structure. The purpose of this study is to use data mining techniques to investigate and predict the parameters of soluble phosphorus and oxygen in Lake Erie to achieve this purpose. The Classification And Regression Tree (CART) model is compared with the Chi-squared Automatic Interaction Detector (CHAID) model and the Quick Unbiased Efficient Statistical Trees (QUEST) model with the C5 model. Comparison and review of these models to express their applicability to identify water quality parameters are conducted. The results show that decision tree methods with the help of hydrochemical parameters can classify and predict water quality with high accuracy and in a short time. The number of available data is 327. To check the accuracy of the models, the difference between the observed data and the predicted data is used. In the prediction of dissolved oxygen, 214 cases with the CART model and 185 cases with the CHAID model differ by less than 2 units from the observed data. For phosphorus, 245 cases in the CART model and 237 cases in the CHAID model differ less than 0.2 the predicted data with the observed data. Therefore, the accuracy of the CART model is better. The prediction of 256 phosphorus parameter group numbers and 230 dissolved oxygen parameter group numbers with the C5 algorithm is correct. The results show that CART model is better than CHAID model in predicting data, and C5 model is better than QUEST model in predicting group numbers.
引用
收藏
页数:13
相关论文
共 26 条
[1]   Analysis of water quality indices and machine learning techniques for rating water pollution: a case study of Rawal Dam, Pakistan [J].
Ahmed, Mehreen ;
Mumtaz, Rafia ;
Mohammad, Syed .
WATER SUPPLY, 2021, 21 (06) :3225-3250
[2]   Digital mapping of soil erodibility factors based on decision tree using geostatistical approaches in terrestrial ecosystem [J].
Alaboz, Pelin ;
Dengiz, Orhan ;
Demir, Sinan ;
Senol, Huseyin .
CATENA, 2021, 207
[3]   Comparison of the performance of decision tree (DT) algorithms and extreme learning machine (ELM) model in the prediction of water quality of the Upper Green River watershed [J].
Anmala, Jagadeesh ;
Turuganti, Venkateswarlu .
WATER ENVIRONMENT RESEARCH, 2021, 93 (11) :2360-2373
[4]   Comparisons of decision tree methods using water data [J].
Azam, Muhammad ;
Aslam, Muhammad ;
Khan, Khushnoor ;
Mughal, Anwar ;
Inayat, Awais .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (04) :2924-2934
[5]   Identifying the determinant habitat characteristics influencing the spatial distribution of Ferula ovina (Boiss.) in semiarid rangelands of Iran using machine learning methods [J].
Bashari, H. ;
Tarkesh, M. ;
Besalatpour, A. A. .
ECOLOGICAL COMPLEXITY, 2021, 45
[6]   Application of M5 model tree optimized with Excel Solver Platform for water quality parameter estimation [J].
Bayatvarkeshi, Maryam ;
Imteaz, Monzur Alam ;
Kisi, Ozgur ;
Zarei, Mahtab ;
Yaseen, Zaher Mundher .
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2021, 28 (06) :7347-7364
[7]   Tracking cyanobacteria blooms: Do different monitoring approaches tell the same story? [J].
Bertani, Isabella ;
Steger, Cara E. ;
Obenour, Daniel R. ;
Fahnenstiel, Gary L. ;
Bridgeman, Thomas B. ;
Johengen, Thomas H. ;
Sayers, Michael J. ;
Shuchman, Robert A. ;
Scavia, Donald .
SCIENCE OF THE TOTAL ENVIRONMENT, 2017, 575 :294-308
[8]   Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data [J].
Chen, Kangyang ;
Chen, Hexia ;
Zhou, Chuanlong ;
Huang, Yichao ;
Qi, Xiangyang ;
Shen, Ruqin ;
Liu, Fengrui ;
Zuo, Min ;
Zou, Xinyi ;
Wang, Jinfeng ;
Zhang, Yan ;
Chen, Da ;
Chen, Xingguo ;
Deng, Yongfeng ;
Ren, Hongqiang .
WATER RESEARCH, 2020, 171
[9]   Determining quality of water in reservoir using machine learning [J].
Chou, Jui-Sheng ;
Ho, Chia-Chun ;
Hoang, Ha-Son .
ECOLOGICAL INFORMATICS, 2018, 44 :57-75
[10]   Comparison of multilabel classification models to forecast project dispute resolutions [J].
Chou, Jui-Sheng .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (11) :10202-10211