Customised-sampling approach for pipe failure prediction in water distribution networks

被引:5
作者
Latifi, Milad [1 ]
Zali, Ramiz Beig [1 ]
Javadi, Akbar A. [1 ]
Farmani, Raziyeh [1 ]
机构
[1] Univ Exeter, Ctr Water Syst, Exeter, England
基金
“创新英国”项目;
关键词
Failure prediction in pipes; Water distribution networks; Machine learning; Imbalance class data; Under-sampling; Over-sampling; Class weighting; CLASS IMBALANCE; RELIABILITY; CHALLENGES; SMOTE;
D O I
10.1038/s41598-024-69109-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper presents a new methodology for addressing imbalanced class data for failure prediction in Water Distribution Networks (WDNs). The proposed methodology relies on existing approaches including under-sampling, over-sampling, and class weighting as primary strategies. These techniques aim to treat the imbalanced datasets by adjusting the representation of minority and majority classes. Under-sampling reduces data in the majority class, over-sampling adds data to the minority class, and class weighting assigns unequal weights based on class counts to balance the influence of each class during machine learning (ML) model training. In this paper, the mentioned approaches were used at levels other than "balance point" to construct pipe failure prediction models for a WDN with highly imbalanced data. F1-score, and AUC-ROC, were selected to evaluate model performance. Results revealed that under-sampling above the balance point yields the highest F1-score, while over-sampling below the balance point achieves optimal results. Employing class weights during training and prediction emphasises the efficacy of lower weights than the balance. Combining under-sampling and over-sampling to the same ratio for both majority and minority classes showed limited improvement. However, a more effective predictive model emerged when over-sampling the minority class and under-sampling the majority class to different ratios, followed by applying class weights to balance data.
引用
收藏
页数:19
相关论文
共 37 条
[1]   An evolution of statistical pipe failure models for drinking water networks: a targeted review [J].
Barton, N. A. ;
Hallett, S. H. ;
Jude, S. R. ;
Tran, T. H. .
WATER SUPPLY, 2022, 22 (04) :3784-3813
[2]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[3]   Handling class imbalance in customer churn prediction [J].
Burez, J. ;
Van den Poel, D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4626-4636
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
Choirunnisa Shabrina, 2018, 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), P276, DOI 10.1109/ISRITI.2018.8864335
[6]   Water Network Assessment and Reliability Analysis by Use of Survival Analysis [J].
Christodoulou, Symeon E. .
WATER RESOURCES MANAGEMENT, 2011, 25 (04) :1229-1238
[7]  
Demir S, 2022, EUR J SCI TECH, DOI [10.31590/ejosat.1077867, DOI 10.31590/EJOSAT.1077867, 10.31590/ejosat.1077867]
[8]   Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance [J].
Devi, Debashree ;
Biswas, Saroj Kr. ;
Purkayastha, Biswajit .
PATTERN RECOGNITION LETTERS, 2017, 93 :3-12
[9]  
Dimas P., 2022, Environ. Sci. Proc, DOI [10.3390/environsciproc2022021037, DOI 10.3390/ENVIRONSCIPROC2022021037]
[10]   The operational value of inlet monitoring at service reservoirs [J].
Doronina, A. V. ;
Husband, S. P. ;
Boxall, J. B. ;
Speight, V. L. .
URBAN WATER JOURNAL, 2020, 17 (08) :735-744