Machine learning methods for anomaly classification in wastewater treatment plants

被引:20
作者
Bellamoli, Francesca [1 ,2 ]
Di Iorio, Mattia [3 ]
Vian, Marco [2 ]
Melgani, Farid [1 ]
机构
[1] Univ Trento, Dept Informat Engn & Comp Sci, Via Sommar 9, I-38123 Trento, Italy
[2] ETC Sustainable Solut Srl, Via Palustei 16, I-38121 Trento, Italy
[3] D3 Srl, Via Palustei 16, I-38121 Trento, Italy
关键词
Anomaly detection; Intermittent aeration; Multiclass classification; Supervised machine learning; Wastewater treatment plants; Gradient boosting;
D O I
10.1016/j.jenvman.2023.118594
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Modern wastewater treatment plants base their biological processes on advanced control systems which ensure compliance with discharge limits and minimize energy consumption responding to information from on-line probes. The correct readings of probes are particularly crucial for intermittent aeration controllers, which rely on real-time measurements of ammonia and oxygen in biological tanks. These data are also an important resource for developing artificial intelligence algorithms that can identify process or sensor anomalies, thus guiding the choices of plant operators and automatic process controllers. However, using anomaly detection and classification algorithms in real-time wastewater treatment is challenging because of the noisy nature of sensor measurements, the difficulty of obtaining labeled real-plant data, and the complex and interdependent mechanisms that govern biological processes. This work aims at thoroughly exploring the performance of machine learning methods in detecting and classifying the main anomalies in plants operating with intermittent aeration. Using oxygen, ammonia and aeration power measurements from a set of plants in Italy, we perform both binary and multiclass classification, and we compare them through a rigorous validation procedure that includes a test on an unknown dataset, proposing a new evaluation protocol. The classification methods explored are support vector machine, multilayer perceptron, random forest, and two gradient boosting methods (LightGBM and XGBoost). The best performance was achieved using the gradient boosting ensemble algorithms, with up to 96% of anomalies detected and up to 84% and 62% of anomalies classified correctly on the first and second datasets respectively.
引用
收藏
页数:10
相关论文
共 28 条
[1]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[2]  
Alex J., 2008, BENCHMARK SIMULATION
[3]  
[Anonymous], 1979, Information Retrieval
[4]   Real-time fault detection and isolation in biological wastewater treatment plants [J].
Baggiani, F. ;
Marsili-Libelli, S. .
WATER SCIENCE AND TECHNOLOGY, 2009, 60 (11) :2949-2961
[5]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Chen A, 2016, PROC IEEE INT SYMP, P1022, DOI 10.1109/ISIE.2016.7745032
[9]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[10]   Development of smart data analytics tools to support wastewater treatment plant operation [J].
Chow, Christopher W. K. ;
Liu, Jixue ;
Li, Jiuyong ;
Swain, Nick ;
Reid, Katherine ;
Saint, Christopher P. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 177 :140-150