Predicting crash occurrence at intersections in Texas: an opportunity for machine learning

被引:2
作者
Charm, Theodore [1 ]
Wang, Haoqi [2 ]
Zuniga-Garcia, Natalia [3 ]
Ahmed, Mostaq [4 ]
Kockelman, Kara M. [3 ]
机构
[1] Univ Texas Austin, Dept Govt, Austin, TX USA
[2] Univ Texas Austin, Dept Biomed Engn, Austin, TX USA
[3] Univ Texas Austin, Dept Civil Architectural & Environm Engn, Austin, TX 78712 USA
[4] Univ Texas Austin, Dept Community & Reg Planning, Austin, TX USA
关键词
Motor vehicle crashes; intersection safety; crash counts; machine learning; imbalanced data; BINOMIAL-LINDLEY MODEL; DRIVER INJURY SEVERITY; TRAFFIC ACCIDENTS; HIGHWAYS;
D O I
10.1080/03081060.2023.2177651
中图分类号
U [交通运输];
学科分类号
08 ; 0823 ;
摘要
This paper studies the frequency of traffic crashes at intersections across Texas by employing Zero-inflated Negative Binomial (ZINB) and Negative Binomial-Lindley (NB-L) generalized linear models, as well as various tree-based machine learning (ML) methods, namely Random Forests (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Bayesian Additive Regression Trees (BART) to predict the frequency of crashes at intersections. Official crash reports from 2010 through 2019 were linked to Texas' over 700,000 intersections. RF provided best prediction performance (using R-square and Root Mean Square Error metrics) while serving well for highly imbalanced crash data (with many zero cases). Sensitivity analysis highlights the practical significance of signalized intersection, annual average daily traffic, number of lanes at intersection approach, and other covariates.
引用
收藏
页码:1184 / 1204
页数:21
相关论文
共 57 条
[1]   Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections [J].
Abdelwahab, HT ;
Abdel-Aty, MA .
HIGHWAY SAFETY: MODELING, ANALYSIS, MANAGEMENT, STATISTICAL METHODS, AND CRASH LOCATION: SAFETY AND HUMAN PERFORMANCE, 2001, (1746) :6-13
[2]  
[Anonymous], 2020, American Community Survey website
[4]   The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments [J].
Bar-Gera, Hillel .
AMERICAN STATISTICIAN, 2017, 71 (02) :112-119
[5]   A random parameters with heterogeneity in means and Lindley approach to analyze crash data with excessive zeros: A case study of head-on heavy vehicle crashes in Queensland [J].
Behara, Krishna N. S. ;
Paz, Alexander ;
Arndt, Owen ;
Baker, Douglas .
ACCIDENT ANALYSIS AND PREVENTION, 2021, 160
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
C.R.I.S. Texas Department of Transportation, 2020, CRIS QUER
[8]   A crash-prediction model for multilane roads [J].
Caliendo, Ciro ;
Guida, Maurizio ;
Parisi, Alessandra .
ACCIDENT ANALYSIS AND PREVENTION, 2007, 39 (04) :657-670
[9]   Visualizing the Feature Importance for Black Box Models [J].
Casalicchio, Giuseppe ;
Molnar, Christoph ;
Bischl, Bernd .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 :655-670
[10]   Data mining of tree-based models to analyze freeway accident frequency [J].
Chang, LY ;
Chen, WC .
JOURNAL OF SAFETY RESEARCH, 2005, 36 (04) :365-375