A Bibliometric Analysis and Benchmark of Machine Learning and AutoML in Crash Severity Prediction: The Case Study of Three Colombian Cities

被引:19
作者
Angarita-Zapata, Juan S. [1 ]
Maestre-Gongora, Gina [2 ]
Calderin, Jenny Fajardo [1 ]
机构
[1] Univ Deusto, Fac Engn, DeustoTech, Bilbao 48007, Spain
[2] Univ Cooperat Colombia, Fac Engn, Medellin 050012, Colombia
基金
欧盟地平线“2020”;
关键词
crash severity prediction; supervised learning; machine learning; automated machine learning; intelligent transportation systems; Internet of Things; DRIVER INJURY SEVERITY; TRAFFIC ACCIDENTS; INTELLIGENCE; FREQUENCY; IMPACT; SAFETY; MODEL;
D O I
10.3390/s21248401
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Traffic accidents are of worldwide concern, as they are one of the leading causes of death globally. One policy designed to cope with them is the design and deployment of road safety systems. These aim to predict crashes based on historical records, provided by new Internet of Things (IoT) technologies, to enhance traffic flow management and promote safer roads. Increasing data availability has helped machine learning (ML) to address the prediction of crashes and their severity. The literature reports numerous contributions regarding survey papers, experimental comparisons of various techniques, and the design of new methods at the point where crash severity prediction (CSP) and ML converge. Despite such progress, and as far as we know, there are no comprehensive research articles that theoretically and practically approach the model selection problem (MSP) in CSP. Thus, this paper introduces a bibliometric analysis and experimental benchmark of ML and automated machine learning (AutoML) as a suitable approach to automatically address the MSP in CSP. Firstly, 2318 bibliographic references were consulted to identify relevant authors, trending topics, keywords evolution, and the most common ML methods used in related-case studies, which revealed an opportunity for the use AutoML in the transportation field. Then, we compared AutoML (AutoGluon, Auto-sklearn, TPOT) and ML (CatBoost, Decision Tree, Extra Trees, Gradient Boosting, Gaussian Naive Bayes, Light Gradient Boosting Machine, Random Forest) methods in three case studies using open data portals belonging to the cities of Medellin, Bogota, and Bucaramanga in Colombia. Our experimentation reveals that AutoGluon and CatBoost are competitive and robust ML approaches to deal with various CSP problems. In addition, we concluded that general-purpose AutoML effectively supports the MSP in CSP without developing domain-focused AutoML methods for this supervised learning problem. Finally, based on the results obtained, we introduce challenges and research opportunities that the community should explore to enhance the contributions that ML and AutoML can bring to CSP and other transportation areas.
引用
收藏
页数:22
相关论文
共 80 条
[2]   Modeling traffic accident occurrence and involvement [J].
Abdel-Aty, MA ;
Radwan, AE .
ACCIDENT ANALYSIS AND PREVENTION, 2000, 32 (05) :633-642
[3]   Crash severity analysis of rear-end crashes in California using statistical and machine learning classification methods [J].
Ahmadi, Alidad ;
Jahangiri, Arash ;
Berardi, Vincent ;
Machiani, Sahar Ghanipoor .
JOURNAL OF TRANSPORTATION SAFETY & SECURITY, 2020, 12 (04) :522-546
[4]  
Al Mamlook RE, 2020, INT CONF ELECTRO INF, P105, DOI [10.1109/eit48999.2020.9208259, 10.1109/EIT48999.2020.9208259]
[5]  
Angarita-Zapata J.S., 2021, P 19 WORLD C INT FUZ, P187, DOI [10.2991/asum.k.210827.026, DOI 10.2991/ASUM.K.210827.026]
[6]   General-Purpose Automated Machine Learning for Transportation: A Case Study of Auto-sklearn for Traffic Forecasting [J].
Angarita-Zapata, Juan S. ;
Masegosa, Antonio D. ;
Triguero, Isaac .
INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, IPMU 2020, PT II, 2020, 1238 :728-744
[7]  
Angarita-Zapata JS, 2020, STUD COMPUT INTELL, V872, P187, DOI 10.1007/978-3-030-34409-2_11
[8]   A Preliminary Study on Automatic Algorithm Selection for Short-Term Traffic Forecasting [J].
Angarita-Zapata, Juan S. ;
Triguero, Isaac ;
Masegosa, Antonio D. .
INTELLIGENT DISTRIBUTED COMPUTING XII, 2018, 798 :204-214
[9]  
[Anonymous], 2020, REV PORTAFOLIO
[10]   bibliometrix: An R-tool for comprehensive science mapping analysis [J].
Aria, Massimo ;
Cuccurullo, Corrado .
JOURNAL OF INFORMETRICS, 2017, 11 (04) :959-975