Efficient missing data imputing for traffic flow by considering temporal and spatial dependence

被引:231
作者
Li, Li [1 ]
Li, Yuebiao [1 ]
Li, Zhiheng [1 ]
机构
[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol TNList, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Traffic flow; Missing data; Temporal and spatial dependence; Probabilistic principle component analysis (PPCA); Kernel probabilistic principle component analysis (KPPCA); INTELLIGENT TRANSPORTATION SYSTEMS; PRINCIPAL COMPONENT ANALYSIS; NEURAL-NETWORKS; IMPUTATION; MULTIVARIATE; PREDICTION; MODELS; REGRESSION; VALUES;
D O I
10.1016/j.trc.2013.05.008
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
The missing data problem remains as a difficulty in a diverse variety of transportation applications, e.g. traffic flow prediction and traffic pattern recognition. To solve this problem, numerous algorithms had been proposed in the last decade to impute the missed data. However, few existing studies had fully used the traffic flow information of neighboring detecting points to improve imputing performance. In this paper, probabilistic principle component analysis (PPCA) based imputing method, which had been proven to be one of the most effective imputing methods without using temporal or spatial dependence, is extended to utilize the information of multiple points. We systematically examine the potential benefits of multi-point data fusion and study the possible influence of measurement time lags. Tests indicate that the hidden temporal-spatial dependence is nonlinear and could be better retrieved by kernel probabilistic principle component analysis (KPPCA) based method rather than PPCA method. Comparison proves that imputing errors can be notably reduced, if temporal-spatial dependence has been appropriately considered. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:108 / 120
页数:13
相关论文
共 51 条
[1]  
Ahmed M. S., 1979, Analysis of freeway traffic timeseries data by using Box-Jenkins techniques
[2]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[3]  
[Anonymous], 1978, A Practical Guide to Splines
[4]   Measuring traffic [J].
Bickel, Peter J. ;
Chen, Chao ;
Kwon, Jaimyoung ;
Rice, John ;
van Zwet, Erik ;
Varaiya, Pravin .
STATISTICAL SCIENCE, 2007, 22 (04) :581-597
[5]  
Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
[6]   Detecting errors and imputing missing data for single-loop surveillance systems [J].
Chen, C ;
Kwon, J ;
Rice, J ;
Skabardonis, A ;
Varaiya, P .
TRANSPORTATION DATA RESEARCH: PLANNING AND ADMINISTRATION, 2003, (1855) :160-167
[7]   The retrieval of intra-day trend and its influence on traffic prediction [J].
Chen, Chenyi ;
Wang, Yin ;
Li, Li ;
Hu, Jianming ;
Zhang, Zuo .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2012, 22 :103-118
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   Non-parametric regression for space-time forecasting under missing data [J].
Haworth, James ;
Cheng, Tao .
COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2012, 36 (06) :538-550
[10]  
Hoyle DC, 2008, J MACH LEARN RES, V9, P2733