Exploiting synthetic data generation to enhance pollution prediction

被引:0
作者
Morales-Garcia, Juan [1 ]
Ramos-Sorroche, Emilio [2 ]
Balderas-Diaz, Sara [3 ]
Guerrero-Contreras, Gabriel [3 ]
Munoz, Andres [3 ]
Santa, Jose [2 ]
Terroso-Saenz, Fernando [2 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Alicante, Spain
[2] Tech Univ Cartagena, Sch Telecommun Engn, Cartagena, Spain
[3] Univ Cadiz, Dept Comp Engn, Cadiz, Spain
关键词
Synthetic data; Pollution; Machine learning; Deep learning; Forecasting; NETWORKS;
D O I
10.1016/j.asoc.2025.113076
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pollution in urban areas is turning into a primary focus for local governments in developed nations around the globe. Lots of data are currently collected for this from smart developments provided with atmospheric and climatic sensors. A hot research line is now exploiting such data to extract patterns and predict pollution levels in such a way that countermeasures can be taken beforehand and exposure to harmful concentrations is avoided. However, a key issue is the lack of significant data, due to incomplete smart infrastructures or calibration problems in sensors. Dealing with this, in this paper we propose the exploitation of synthetic data generation to enhance pollution prediction based on limited data sources, concretely extending real measurements of two weeks to up to ten extra years. We present a data generation approach based on Generative Adversarial Networks (GANs), with a particular model focused on generating artificial pollution data, which is later exploited using different Machine Learning (ML) algorithms. Results indicate that the usage of synthetic data further improves prediction when used as the basis dataset to be later finetuned using real records. For 62% of pollutants this way to proceed in data mixing (among five different approaches) provides the best results in evaluations. Such effect is due to extra model robustness due to data regularization, and better generalization capabilities by avoiding sensor limitations in real deployments.
引用
收藏
页数:22
相关论文
共 43 条
[1]   Regional spatio-temporal forecasting of particulate matter using autoencoder based generative adversarial network [J].
Abirami, S. ;
Chitra, P. .
STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2022, 36 (05) :1255-1276
[2]   A new method for prediction of air pollution based on intelligent computation [J].
Al-Janabi, Samaher ;
Mohammad, Mustafa ;
Al-Sultan, Ali .
SOFT COMPUTING, 2020, 24 (01) :661-680
[3]   Air quality particulate-pollution prediction applying GAN network and the Neural Turing Machine [J].
Asaei-Moamam, Zahra-Sadat ;
Safi-Esfahani, Faramraz ;
Mirjalili, Seyedali ;
Mohammadpour, Reza ;
Nadimi-Shahraki, Mohamad-Hosein .
APPLIED SOFT COMPUTING, 2023, 147
[4]   Machine Learning-Based Forecasting of Metocean Data for Offshore Engineering Applications [J].
Barooni, Mohammad ;
Ghaderpour Taleghani, Shiva ;
Bahrami, Masoumeh ;
Sedigh, Parviz ;
Velioglu Sogut, Deniz .
ATMOSPHERE, 2024, 15 (06)
[5]  
Brownlee J., 2018, Mach. Learn. Mastery
[6]   Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder [J].
Carrle, Friedrich Philipp ;
Hollenbenders, Yasmin ;
Reichenbach, Alexandra .
FRONTIERS IN NEUROSCIENCE, 2023, 17
[7]   HydraGAN: A Cooperative Agent Model for Multi-Objective Data Generation [J].
Desmet, Chance ;
Cook, Diane .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
[8]   A synthetic data generation system based on the variational-autoencoder technique and the linked data paradigm [J].
Dos Santos, Ricardo ;
Aguilar, Jose .
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, 13 (02) :149-163
[9]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[10]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]