Noisy values detection and correction of traffic accident data

被引:20
作者
Deb, Rupam [1 ]
Liew, Alan Wee-Chung [1 ]
机构
[1] Griffith Univ, Sch Informat & Commun, Gold Coast Campus, Nathan, Qld 4222, Australia
关键词
Data cleansing; Noisy value detection; Road traffic accident; Data preprocessing; Categorical data; MISSING VALUE IMPUTATION;
D O I
10.1016/j.ins.2018.10.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Death, injury, and disability from road traffic crashes continue to be a major global public health problem. Therefore, methods to reduce accident severity are of significant interest to traffic agencies and the public at large. Noisy data in the traffic accident dataset obscure the discovery of important factors and mislead conclusions. Identifying and correcting noisy values is an important goal of data cleansing and preprocessing. This paper proposes a new algorithm called NoiseCleaner to identify and correct noisy categorical attributes values in large traffic accident datasets. We evaluate our algorithm using four publicly available traffic accident datasets from Australia and United States, namely, two road crash datasets from the Queensland Government data depository (data.q1d.gov.au) and two datasets from the New York's open data portal (data.ny.gov). We compare our technique with several existing state-of-the-art methods and show that our algorithm performs significantly better than the existing algorithms. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:132 / 146
页数:15
相关论文
共 17 条
  • [1] A fast and noise resilient cluster-based anomaly detection
    Bigdeli, Elnaz
    Mohammadi, Mahdi
    Raahemi, Bijan
    Matwin, Stan
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (01) : 183 - 199
  • [2] Analysis of traffic injury severity: An application of non-parametric classification tree techniques
    Chang, Li-Yen
    Wang, Hsiu-Wen
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2006, 38 (05) : 1019 - 1027
  • [3] Choh Man Teng, 2001, Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, P269
  • [4] Deb, 2014, P ICMLC 2014 C CCIS, P275
  • [5] Deb R., 2015, P IJCNN C KILL IR JU, P1
  • [6] Missing value imputation for the analysis of incomplete traffic accident data
    Deb, Rupam
    Liew, Alan Wee -Chung
    [J]. INFORMATION SCIENCES, 2016, 339 : 274 - 289
  • [7] Deb R, 2014, LECT NOTES ARTIF INT, V8862, P905, DOI 10.1007/978-3-319-13560-1_77
  • [8] Delany SJ, 2009, LECT NOTES ARTIF INT, V5650, P135, DOI 10.1007/978-3-642-02998-1_11
  • [9] A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms
    Fogue, Manuel
    Garrido, Piedad
    Martinez, Francisco J.
    Cano, Juan-Carlos
    Calafate, Carlos T.
    Manzoni, Pietro
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (01) : 323 - 336
  • [10] Microarray missing data imputation based on a set theoretic framework and biological knowledge
    Gan, XC
    Liew, AWC
    Yan, H
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 (05) : 1608 - 1619