Missing traffic data: comparison of imputation methods

被引:145
作者
Li, Yuebiao [1 ]
Li, Zhiheng [1 ]
Li, Li [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
interpolation; principal component analysis; probability; traffic engineering computing; road traffic control; traffic management applications; traffic control applications; traffic flow data prediction; sensor failure; transmission error; missing traffic data estimation; data imputation methods; prediction methods; interpolation methods; statistical learning methods; reconstruction errors; statistical behaviours; running speeds; probabilistic principal component analysis; PPCA; numerical tests; FLOW PREDICTION; NEURAL-NETWORKS; MODELS;
D O I
10.1049/iet-its.2013.0052
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many traffic management and control applications require highly complete and accurate data of traffic flow. However, because of various reasons such as sensor failure or transmission error, it is common that some traffic flow data are lost. As a result, various methods were proposed by using a wide spectrum of techniques to estimate missing traffic data in the last two decades. Generally, these missing data imputation methods can be categorised into three kinds: prediction methods, interpolation methods and statistical learning methods. To assess their performance, these methods are compared from different aspects in this paper, including reconstruction errors, statistical behaviours and running speeds. Results show that statistical learning methods are more effective than the other two kinds of imputation methods when data of a single detector is utilised. Among various methods, the probabilistic principal component analysis (PPCA) yields best performance in all aspects. Numerical tests demonstrate that PPCA can be used to impute data online before making further analysis (e.g. make traffic prediction) and is robust to weather changes.
引用
收藏
页码:51 / 57
页数:7
相关论文
共 28 条
[1]  
Ahmed M. S., 1979, Analysis of freeway traffic timeseries data by using Box-Jenkins techniques
[2]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[3]  
[Anonymous], 2006, Introduction to Time Series and Forecasting
[4]  
[Anonymous], 1995, Markov Chain Monte Carlo in Practice
[5]   Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions [J].
Castro-Neto, Manoel ;
Jeong, Young-Seon ;
Jeong, Myong-Kee ;
Han, Lee D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :6164-6173
[6]   Detecting errors and imputing missing data for single-loop surveillance systems [J].
Chen, C ;
Kwon, J ;
Rice, J ;
Skabardonis, A ;
Varaiya, P .
TRANSPORTATION DATA RESEARCH: PLANNING AND ADMINISTRATION, 2003, (1855) :160-167
[7]   The retrieval of intra-day trend and its influence on traffic prediction [J].
Chen, Chenyi ;
Wang, Yin ;
Li, Li ;
Hu, Jianming ;
Zhang, Zuo .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2012, 22 :103-118
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   An object-oriented neural network approach to short-term traffic forecasting [J].
Dia, H .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2001, 131 (02) :253-261