Multi-class random forest model to classify wastewater treatment imbalanced data

被引:2
作者
Distefano, Veronica [1 ,2 ]
Palma, Monica [1 ,3 ]
De Iaco, Sandra [1 ,3 ,4 ]
机构
[1] Univ Salento, Dept Econ Sci, Lecce, Italy
[2] CaFoscari Univ Venice, European Ctr Living Technol ECLT, Venice, Italy
[3] Natl Ctr HPC Big Data & Quantum Comp, Bologna, Italy
[4] Natl Biodivers Future Ctr, Palermo, Italy
关键词
Multi-classification; Data imbalance; Resampling approach; Treatment plant sections; Electronic nose; Machine learning; CLASSIFICATION; SMOTE;
D O I
10.1016/j.seps.2024.102021
中图分类号
F [经济];
学科分类号
02 ;
摘要
The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi- parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi- parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.
引用
收藏
页数:10
相关论文
共 39 条
[1]  
AL-Behadili H. N. K., 2021, International Journal of Intelligent Engineering and Systems, V14, P294
[2]  
BARTH CL, 1984, T ASAE, V27, P859
[3]  
Bax C, 2023, Air Quality Networks: Data Analysis, Calibration & Data Fusion, P95
[4]  
Breiman Leo, 2017, Classification and Regression Trees, DOI 10.1201/9781315139470
[5]   Characterization of odour emissions in a wastewater treatment plant using a drone-based chemical sensor system [J].
Burgues, Javier ;
Donate, Silvia ;
Esclapez, Maria Deseada ;
Sauco, Lidia ;
Marco, Santiago .
SCIENCE OF THE TOTAL ENVIRONMENT, 2022, 846
[6]   The Use of Artificial Neural Networks and Decision Trees to Predict the Degree of Odor Nuisance of Post-Digestion Sludge in the Sewage Treatment Plant Process [J].
Bylinski, Hubert ;
Sobecki, Andrzej ;
Gebicki, Jacek .
SUSTAINABILITY, 2019, 11 (16)
[7]   Application of Machine Learning for Fenceline Monitoring of Odor Classes and Concentrations at a Wastewater Treatment Plant [J].
Cangialosi, Federico ;
Bruno, Edoardo ;
De Santis, Gabriella .
SENSORS, 2021, 21 (14)
[8]   Electronic Noses for Environmental Monitoring Applications [J].
Capelli, Laura ;
Sironi, Selena ;
Del Rosso, Renato .
SENSORS, 2014, 14 (11) :19979-20007
[9]   Modeling the Odor Generation in WWTP: An Integrated Approach Review [J].
Carrera-Chapela, Fabio ;
Donoso-Bravo, Andres ;
Souto, Jose A. ;
Ruiz-Filippi, Gonzalo .
WATER AIR AND SOIL POLLUTION, 2014, 225 (06)
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)