Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques: Case study of Hong Kong

被引:66
作者
Gong, Bing [1 ]
Ordieres-Mere, Joaquin [1 ]
机构
[1] Univ Politecn Madrid, Dept Ind Engn Business Adm & Stat, ETS Ind Engn, Calle Jose Gutierrez Abascal 2, E-28006 Madrid, Spain
关键词
Ozone level forecasting; Classification; Artificial intelligence; Re-sampling; Imbalanced data; Ensemble models; NEURAL-NETWORKS; NITROGEN-DIOXIDE; IMBALANCED DATA; CLASSIFICATION; MODELS; REGRESSION; ALGORITHMS; POLLUTION; EMISSIONS; SELECTION;
D O I
10.1016/j.envsoft.2016.06.020
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The objective of this study was to apply preprocessing and ensemble artificial intelligence classifiers to forecast daily maximum ozone threshold exceedances in the Hong Kong area. Preprocessing methods, including over-sampling, under-sampling, and the synthetic minority over-sampling technique, were employed to address the imbalance data problem. Ensemble algorithms are proposed to improve the classifier's accuracy. Moreover, a distance-based regional data set was generated to capture ozone transportation characteristics. The results show that a combination of preprocessing methods and ensemble algorithms can effectively forecast ozone threshold exceedances. Furthermore, this study advises on the relative importance of the different variables for ozone pollution prediction and confirms that regional data facilitate better forecasting. The results of this research can be promoted by the Hong Kong authorities for improving the existing forecasting tools. Moreover, the results can facilitate researchers' selection of the appropriate techniques in their future research. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:290 / 303
页数:14
相关论文
共 63 条
  • [1] Applying support vector machines to imbalanced datasets
    Akbani, R
    Kwek, S
    Japkowicz, N
    [J]. MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 39 - 50
  • [2] On learning algorithm selection for classification
    Ali, S
    Smith, KA
    [J]. APPLIED SOFT COMPUTING, 2006, 6 (02) : 119 - 138
  • [3] Comparison of Decision Tree Algorithms for Predicting Potential Air Pollutant Emissions with Data Mining Models
    Birant, D.
    [J]. JOURNAL OF ENVIRONMENTAL INFORMATICS, 2011, 17 (01) : 46 - 53
  • [4] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [5] Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
  • [6] Breiman L, 1996, MACH LEARN, V24, P49
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Comparison of support vector machine and artificial neural network systems for drug/nondrug classification
    Byvatov, E
    Fechner, U
    Sadowski, J
    Schneider, G
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06): : 1882 - 1889
  • [9] The boosting: A new idea of building models
    Cao, Dong-Sheng
    Xu, Qing-Song
    Liang, Yi-Zeng
    Zhang, Liang-Xiao
    Li, Hong-Dong
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 100 (01) : 1 - 11
  • [10] Caruana R., 2004, P 21 INT C MACH LEAR, P18, DOI DOI 10.1145/1015330.1015432