Using traffic flow characteristics to predict real-time conflict risk: A novel method for trajectory data analysis

被引:67
作者
Yuan, Chen [1 ,2 ]
Li, Ye [1 ]
Huang, Helai [1 ]
Wang, Shiqi [2 ]
Sun, Zhenhao [2 ]
Li, Yan [3 ]
机构
[1] Cent South Univ, Sch Traff & Transportat Engn, Changsha 410075, Hunan, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Hunan Xiangjiang Intelligent Technol Innovat Ctr C, Changsha 410075, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Real-time conflict risk; Heterogeneity; Random parameter; Machine learning; CRASH RISK; STATISTICAL-ANALYSIS; ANALYTIC METHODS; SAFETY ANALYSIS; INJURY; SPEED; FREQUENCY; MODELS; TRANSFERABILITY; HETEROGENEITY;
D O I
10.1016/j.amar.2022.100217
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
The real-time conflict prediction model using traffic flow characteristics is much less studied than the crash-based model. This study aims at exploring the relationship between conflicts and traffic flow features with the consideration of heterogeneity and developing predictive models to identify conflict-prone conditions in a real-time manner. The high resolution trajectory data from the HighD dataset is used as empirical data. A novel method with the virtual detector approach for traffic feature extraction and a two-step framework is proposed for the trajectory data analysis. The framework consists of an exploratory study by random parameter logit model with heterogeneity in means and variances and a comparative study on several machine learning methods, including eXtreme Gradient Boosting (Boosting), Random Forest (Bagging), Support Vector Machine (Single-classifier), and Multilayer-Perceptron (Deep neural network). Results indicate that (1) traffic flow characteristics have significant impacts on the probability of conflict occurrence; (2) the statistical model considering mean heterogeneity outperforms the counterpart and lane differences variables are found to significantly impact the means of random parameters for both lane variables and lane differences variables; (3) eXtreme Gradient Boosting trained on an under-sampled dataset turns out to be the best model with the highest AUC of 0.871 and precision of 0.867, showing that re-sampling techniques can significantly improve the model performance. The proposed model is found to be sensitive to the conflict threshold. Sensitivity analysis on feature selection further confirms that the conflict risk prediction should consider both subject lane features and lane difference features, which verifies the consistency with exploratory analysis based on the statistical model. The consistency between statistical models and machine learning methods improves the interpretability of results for the latter one. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
empty
未找到相关数据