A comparison of outlier detection algorithms for ITS data

被引:57
作者
Chen, Shuyan [1 ,2 ]
Wang, Wei [1 ]
van Zuylen, Henk [2 ]
机构
[1] Southeast Univ, Transportat Coll, Nanjing 210096, Peoples R China
[2] Delft Univ Technol, NL-2600 GA Delft, Netherlands
基金
中国博士后科学基金;
关键词
Outlier detection; Traffic data; Statistics-based; Distance-based; Density-based;
D O I
10.1016/j.eswa.2009.06.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to improve the veracity and reliability of a traffic model built, or to extract important and valuable information from collected traffic data. the technique of outlier mining has been introduced into the traffic engineering domain for detecting and analyzing the outliers in traffic data sets. Three typical outlier algorithms, respectively the statistics-based approach, the distance-based approach, and the density-based local Outlier approach. are described with respect to the principle, the characteristics and the time complexity of the algorithms. A comparison among the three algorithms is made through application to intelligent transportation systems (ITS). Two traffic data sets with different dimensions have been used in our experiments carried Out. one is travel time data, and the other is traffic flow data. We conducted a number of experiments to recognize outliers hidden in the data sets before building the travel time prediction model and the traffic flow foundation diagram. In addition, some artificial generated outliers are introduced into the traffic flow data to see flow well the different algorithms detect them. Three strategies-based on ensemble learning, partition and average LOF have been proposed to develop a better outlier recognizer The experimental results reveal that these methods Of Outlier mining are feasible and valid to detect outliers in traffic data sets. and have a good potential for use in the domain of traffic engineering. The comparison and analysis presented in this paper are expected to provide some insights to practitioners who plan to use outlier mining for ITS data. (C) 2009 Published by Elsevier Ltd
引用
收藏
页码:1169 / 1178
页数:10
相关论文
共 16 条
  • [1] Outlier mining in large high-dimensional data sets
    Angiulli, F
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
  • [2] BAN X, 2007, P 14 WORLD C INT TRA
  • [3] LOF: Identifying density-based local outliers
    Breunig, MM
    Kriegel, HP
    Ng, RT
    Sander, J
    [J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
  • [4] Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00009-5]
  • [5] HUANG H, 2006, J RES COMPUTER APPL, V8, P8
  • [6] Kingan RJ, 2006, TRANSPORT RES REC, P51
  • [7] Knorr E. M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P392
  • [8] Distance-based outliers: algorithms and applications
    Knorr, EM
    Ng, RT
    Tucakov, V
    [J]. VLDB JOURNAL, 2000, 8 (3-4) : 237 - 253
  • [9] Identification of unusual events in multi-channel bridge monitoring data
    Omenzetter, P
    Brownjohn, JMW
    Moyo, P
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2004, 18 (02) : 409 - 430
  • [10] LOCI: Fast Outlier detection using the local correlation integral
    Papadimitriou, S
    Kitagawa, H
    Gibbons, PB
    Faloutsos, C
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 315 - 326