Railroad accident analysis using extreme gradient boosting

被引:20
作者
Bridgelall, Raj [1 ]
Tolliver, Denver D. [2 ]
机构
[1] North Dakota State Univ, Dept Transportat Logist & Finance, Coll Business, Fargo, ND 58108 USA
[2] North Dakota State Univ, Upper Great Plains Transportat Inst, Fargo, ND 58108 USA
关键词
Data cleaning; Feature engineering; Financial loss; Machine learning; Principle component analysis; Risk management; MODELS;
D O I
10.1016/j.aap.2021.106126
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
Railroads are critical to the economic health of a nation. Unfortunately, railroads lose hundreds of millions of dollars from accidents each year. Trends reveal that derailments consistently account for more than 70 % of the U.S. railroad industry's average annual accident cost. Hence, knowledge of explanatory factors that distinguish derailments from other accident types can inform more cost-effective and impactful railroad risk management strategies. Five feature scoring methods, including ANOVA and Gini, agreed that the top four explanatory factors in accident type prediction were track class, type of movement authority, excess speed, and territory signalization. Among 11 different types of machine learning algorithms, the extreme gradient boosting method was most effective at predicting the accident type with an area under the receiver operating curve (AUC) metric of 89 %. Principle component analysis revealed that relative to other accident types, derailments were more strongly associated with lower track classes, non-signalized territories, and movement authorizations within restricted limits. On average, derailments occurred at 16 kph below the speed limit for the track class whereas other accident types occurred at 32 kph below the speed limit. Railroads can use the integrated data preparation, machine learning, and feature ranking framework presented to gain additional insights for managing risk, based on their unique operating environments.
引用
收藏
页数:14
相关论文
共 41 条
[1]  
Abidin NZ, 2018, INT J ADV COMPUT SC, V9, P442
[2]  
Agresti A., 2018, STAT METHODS SOCIAL, P608
[3]   Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE) [J].
Anowar, Farzana ;
Sadaoui, Samira ;
Selim, Bassant .
COMPUTER SCIENCE REVIEW, 2021, 40
[4]  
ASCE, 2021, AM INFR REP CARD 202
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]  
Bridgelall R., 2018, J ADV TRANSPORT, V2018, P1
[7]   Closed form models to assess railroad technology investments [J].
Bridgelall, Raj ;
Tolliver, Denver D. .
TRANSPORTATION PLANNING AND TECHNOLOGY, 2020, 43 (07) :639-650
[8]   Use and misuse of the receiver operating characteristic curve in risk prediction [J].
Cook, Nancy R. .
CIRCULATION, 2007, 115 (07) :928-935
[9]   Using fixed-parameter and random-parameter ordered regression models to identify significant factors that affect the severity of drivers' injuries in vehicle-train collisions [J].
Dabbour, Essam ;
Easa, Said ;
Haider, Murtaza .
ACCIDENT ANALYSIS AND PREVENTION, 2017, 107 :20-30
[10]   FUZZY GRADING SYSTEM [J].
ECHAUZ, JR ;
VACHTSEVANOS, GJ .
IEEE TRANSACTIONS ON EDUCATION, 1995, 38 (02) :158-165