Prediction in Traffic Accident Duration Based on Heterogeneous Ensemble Learning

被引:34
作者
Zhao, Yuexu [1 ]
Deng, Wei [1 ]
机构
[1] Hangzhou Dianzi Univ, Coll Econ, Hangzhou, Peoples R China
关键词
BAYESIAN-NETWORKS; INCIDENT; METHODOLOGY;
D O I
10.1080/08839514.2021.2018643
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Based on millions of traffic accident data in the United States, we build an accident duration prediction model based on heterogeneous ensemble learning to study the problem of accident duration prediction in the initial stage of the accident. First, we focus on the earlier stage of the accident development, and select some effective information from five aspects of traffic, location, weather, points of interest and time attribute. Then, we improve data quality by means of data cleaning, outlier processing and missing value processing. In addition, we encode category features for high-frequency category variables and extract deeper information from the limited initial information through feature extraction. A pre-processing scheme of accident duration data is established. Finally, from the perspective of model, sample and parameter diversity, we use XGBoost, LightGBM, CatBoost, stacking and elastic network to build a heterogeneous ensemble learning model to predict the accident duration. The results show that the model not only has good prediction accuracy but can synthesize multiple models to give a comprehensive degree of importance of influencing factors, and the feature importance of the model shows that the time, location, weather and relevant historical statistics of the accident are important to the accident duration.
引用
收藏
页数:24
相关论文
共 44 条
[1]   New methodology for estimating reliability in transportation networks with degraded link capacities [J].
Al-Deek, Haitham ;
Emam, Emam B. .
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2006, 10 (03) :117-129
[2]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[3]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[4]   Traffic Incident Duration Estimation Based on a Dual-Learning Bayesian Network Model [J].
Cong, Haozhe ;
Chen, Cong ;
Lin, Pei-Sung ;
Zhang, Guohui ;
Milton, John ;
Zhi, Ye .
TRANSPORTATION RESEARCH RECORD, 2018, 2672 (45) :196-209
[5]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[6]   Adaptive Learning in Bayesian Networks for Incident Duration Prediction [J].
Demiroluk, Sami ;
Ozbay, Kaan .
TRANSPORTATION RESEARCH RECORD, 2014, (2460) :77-85
[7]   Fuzzy modeling of freeway accident duration with rainfall and traffic flow interactions [J].
Dimitriou, Loukas ;
Vlahogianni, Eleni I. .
ANALYTIC METHODS IN ACCIDENT RESEARCH, 2015, 5-6 :59-71
[8]   Methodology for measuring recurrent and nonrecurrent traffic congestion [J].
Dowling, R ;
Skabardonis, A ;
Carroll, M ;
Wang, ZR .
FREEWAY OPERATIONS AND TRAFFIC SIGNAL SYSTEMS 2004, 2004, (1867) :60-68
[9]  
Fleming, 2016, ADVERSARIAL VALIDATI
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232