Automatic Quality Control of Crowdsourced Rainfall Data With Multiple Noises: A Machine Learning Approach

被引:11
作者
Niu, Geng [1 ,2 ,3 ]
Yang, Pan [3 ]
Zheng, Yi [2 ,4 ]
Cai, Ximing [3 ]
Qin, Huapeng [1 ]
机构
[1] Peking Univ, Key Lab Urban Habitat Environm Sci & Technol, Shenzhen Grad Sch, Sch Environm & Energy, Shenzhen, Peoples R China
[2] Southern Univ Sci & Technol, Sch Environm Sci & Engn, Shenzhen, Peoples R China
[3] Univ Illinois, Dept Civil & Environm Engn, Champaign, IL 61820 USA
[4] Southern Univ Sci & Technol, Shenzhen Municipal Engn Lab Environm IoT Technol, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
crowdsourcing rainfall; machine learning; quality control; transferability; NEURAL-NETWORKS; ALGORITHM; VALIDATION; REGIONS;
D O I
10.1029/2020WR029121
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In geophysics, crowdsourcing is an emerging nontraditional environmental monitoring approach that supports data acquisition from individual citizens. However, because of the involvement of undertrained citizens and imprecise low-cost sensors, crowdsourced data applications suffer from different types of noises that can deteriorate the overall monitoring accuracy. In this study, we propose a machine learning approach for automatic crowdsourced data quality control (CSQC) that detects and removes noisy data inputs in spatially and temporally discrete crowdsourced observations coming from both fixed-point sensors (e.g., surveillance cameras) and moving sensors (e.g., moving cars/pedestrians). We design a set of features from original and interpolated rainfall data and use them to train and test the CSQC models using both supervised and unsupervised machine learning algorithms. The performances of the CSQC models under various scenarios assuming no retraining are also tested (hereafter referred to as transferability). The results based on synthetic but realistic data show that the CSQC models can significantly reduce the overall rainfall estimate errors. Under the stationary assumption, the CSQC models based on both supervised and unsupervised algorithms perform well in noisy data identification and overall rainfall estimation error reduction; however, if the model is transferred to other cities with different rainfall patterns or noise compositions (without retraining), supervised multilayer perceptrons (MLPs) show the best performance.
引用
收藏
页数:26
相关论文
共 80 条
[1]  
Aggarwal C. C., 2015, DATA MINING, P237, DOI DOI 10.1007/978-3-319-14142-8
[2]   Quality Control in Crowdsourcing Systems Issues and Directions [J].
Allahbakhsh, Mohammad ;
Benatallah, Boualem ;
Ignjatovic, Aleksandar ;
Motahari-Nezhad, Hamid Reza ;
Bertino, Elisa ;
Dustdar, Schahram .
IEEE INTERNET COMPUTING, 2013, 17 (02) :76-81
[3]  
Alpaydin E, 2014, ADAPT COMPUT MACH LE, P1
[4]   A predictive model for reach morphology classification in mountain streams using multilayer perceptron methods [J].
Altunkaynak, Abduesselam ;
Strom, Kyle B. .
WATER RESOURCES RESEARCH, 2009, 45
[5]  
[Anonymous], 2011, J. Mach.Learn. Res.
[6]  
[Anonymous], 2010, Survey of nearest neighbor techniques
[7]  
[Anonymous], 2011, WORKSH 25 AAAI C ART
[8]   Error analysis of TMI rainfall estimates over ocean for variational data assimilation [J].
Bauer, P ;
Mahfouf, JF ;
Olson, WS ;
Marzano, FS ;
Di Michele, S ;
Tassa, A ;
Mugnai, A .
QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2002, 128 (584) :2129-2144
[9]   How good are citizen weather stations? Addressing a biased opinion [J].
Bell, Simon ;
Cornford, Dan ;
Bastin, Lucy .
WEATHER, 2015, 70 (03) :75-84
[10]   On the necessity and design of studies comparing statistical methods [J].
Boulesteix, Anne-Laure ;
Binder, Harald ;
Abrahamowicz, Michal ;
Sauerbrei, Willi .
BIOMETRICAL JOURNAL, 2018, 60 (01) :216-218