Quality control of online monitoring data of air pollutants using artificial neural networks

被引:13
作者
Wang, Ziyu [1 ]
Feng, Jingjing [1 ,2 ]
Fu, Qingyan [3 ]
Gao, Song [3 ]
Chen, Xiaojia [1 ]
Cheng, Jinping [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Environm Sci & Engn, Dongchuan Rd 800, Shanghai 200230, Peoples R China
[2] Jinan Univ, Sch Environm, Xingye Rd 855, Guangzhou 511486, Peoples R China
[3] Shanghai Environm Monitoring Ctr, Shanghai 200230, Peoples R China
基金
中国国家自然科学基金;
关键词
Data cleaning; Data auditing; Multilayer perceptrons; Recurrent neural network; Automatic monitoring; PREDICTION; POLLUTION; MODEL; AREA;
D O I
10.1007/s11869-019-00734-4
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The intensive monitoring of air pollutants has led to the acquisition of vast quantities of data. Traditional quality control methods based on existing knowledge may be inefficient because of our limited understanding regarding the interaction of human activities and stochastic environmental factors. Moreover, traditional methods for outlier detection may be misleading because of the existence of valid outliers and invalid inliers. In this research, artificial neural networks (ANNs) are developed to identify instrument failure based on current and historical observations. Two structures, i.e., multilayer perceptrons and recurrent networks, are trained using 50,000 hourly data points labeled by human reviewers. The most conservative model identified 57.5% of the invalid sulfur compound observations and 44.9% of the invalid nitrogen compound observations. By setting a more liberal threshold, these values increased to 76.0% and 79.7%, respectively. Except for SO2, the ANNs outperformed the traditional methods for data quality control, as demonstrated with a plausibility test, a test of temporal consistency and a residential analysis. Compared with the test of temporal consistency, which was the most effective traditional method studied, the true positive rates of the ANNs were 19.4% to 29.5% higher for all pollutants except SO2, given the same false positive rates. The results indicate the effectiveness of ANNs for data quality control even without supplementary information. Methods for performance improvement are discussed.
引用
收藏
页码:1189 / 1196
页数:8
相关论文
共 37 条
  • [1] Mapping real-time air pollution health risk for environmental management: Combining mobile and stationary air pollution monitoring with neural network models
    Adams, Matthew D.
    Kanaroglou, Pavlos S.
    [J]. JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2016, 168 : 133 - 141
  • [2] Apiletti D, 2006, J INTEGR BIOINFORMAT, V3, P219
  • [3] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [4] A systematic study of the class imbalance problem in convolutional neural networks
    Buda, Mateusz
    Maki, Atsuto
    Mazurowski, Maciej A.
    [J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
  • [5] CHALONER K, 1988, BIOMETRIKA, V75, P651
  • [6] Di Persio Luca, 2016, International Journal of Circuits, Systems and Signal Processing, V10, P403
  • [7] AN EXPONENTIAL MODEL USED FOR OPTIMAL THRESHOLD SELECTION ON ROC CURVES
    ENGLAND, WL
    [J]. MEDICAL DECISION MAKING, 1988, 8 (02) : 120 - 131
  • [8] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [9] Fayyad U, 1996, AI MAG, V17, P37
  • [10] Indirect source apportionment of methyl mercaptan using CMB and PMF models: a case study near a refining and petrochemical plant
    Feng, Jingjing
    Gao, Song
    Fu, Qingyan
    Chen, Xiaojia
    Chen, Xiaolin
    Han, Demin
    Cheng, Jinping
    [J]. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2019, 26 (23) : 24305 - 24312