An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation

被引:0
|
作者
Khedher, Moataz Bellah Ben [1 ,2 ]
Yun, Dukgeun [1 ,2 ]
机构
[1] Univ Sci & Technol, KICT Sch, Dept Civil & Environm Engn, Daejeon 34113, South Korea
[2] Korea Inst Civil Engn & Bldg Technol, Dept Highway & Transportat Res, Goyang Si 10223, South Korea
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 23期
关键词
crash frequency; machine learning; CatBoost; SHAP; accident analysis; road safety; MOTOR-VEHICLE CRASHES; NEURAL-NETWORK; REGRESSION; POISSON;
D O I
10.3390/app142310790
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Road traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, addressing the limitations of traditional statistical models like the Poisson and negative binomial models, which struggle with zero-inflation and overdispersion. The research employs a two-stage modeling process using CatBoost. The first stage uses binary classification to identify road segments with potential crash occurrences, applying a customized loss function to tackle data imbalance. The second stage predicts crash frequency, also utilizing a customized loss function for count data. SHapley Additive exPlanations (SHAP) analysis interprets the model outcomes, providing insights into factors affecting crash likelihood and frequency. This study validates the model's performance with real-world crash data from 2011 to 2015 in South Korea, demonstrating superior accuracy in both the classification and regression stages compared to other machine learning algorithms and traditional models. These findings have significant implications for traffic safety research and policymaking, offering stakeholders a more accurate and interpretable tool for crash data analysis to develop targeted safety interventions.
引用
收藏
页数:30
相关论文
共 8 条
  • [1] Modeling the service-route-based crash frequency by a spatiotemporal-random-effect zero-inflated negative binomial model: An empirical analysis for bus-involved crashes
    Gu, Xujia
    Yan, Xuedong
    Ma, Lu
    Liu, Xiaobing
    ACCIDENT ANALYSIS AND PREVENTION, 2020, 144
  • [2] A machine learning model for predicting blood concentration of quetiapine in patients with schizophrenia and depression based on real-world data
    Hao, Yupei
    Zhang, Jinyuan
    Yang, Lin
    Zhou, Chunhua
    Yu, Ze
    Gao, Fei
    Hao, Xin
    Pang, Xiaolu
    Yu, Jing
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2023, 89 (09) : 2714 - 2725
  • [3] The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data
    Huan, Jia-Ming
    Wang, Xiao-Jie
    Li, Yuan
    Zhang, Shi-Jun
    Hu, Yuan-Long
    Li, Yun-Lun
    BIODATA MINING, 2024, 17 (01):
  • [4] A Framework for Using Real-World Data and Health Outcomes Modeling to Evaluate Machine Learning-Based Risk Prediction Models
    Rodriguez, Patricia J.
    Veenstra, David L.
    Heagerty, Patrick J.
    Goss, Christopher H.
    Ramos, Kathleen J.
    Bansal, Aasthaa
    VALUE IN HEALTH, 2022, 25 (03) : 350 - 358
  • [5] Identifying Adolescent Depression and Anxiety Through Real-World Data and Social Determinants of Health: Machine Learning Model Development and Validation
    Mardini, Mamoun T.
    Khalil, Georges E.
    Bai, Chen
    Divakaran, Aparna Menon
    Ray, Jessica M.
    JMIR MENTAL HEALTH, 2025, 12
  • [6] Machine learning-based predictive and risk analysis using real-world data with blood biomarkers for hepatitis B patients in the malignant progression of hepatocellular carcinoma
    Nan, Yuemin
    Zhao, Suxian
    Zhang, Xiaoxiao
    Xiao, Zhifeng
    Guo, Ruihan
    FRONTIERS IN IMMUNOLOGY, 2022, 13
  • [7] A clinical prediction model based on interpretable machine learning algorithms for prolonged hospital stay in acute ischemic stroke patients: a real-world study
    Wang, Kai
    Jiang, Qianmei
    Gao, Murong
    Wei, Xiu'e
    Xu, Chan
    Yin, Chengliang
    Liu, Haiyan
    Gu, Renjun
    Wang, Haosheng
    Li, Wenle
    Rong, Liangqun
    FRONTIERS IN ENDOCRINOLOGY, 2023, 14
  • [8] Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system
    Tou, Saori
    Matsumoto, Koutarou
    Hashinokuchi, Asato
    Kinoshita, Fumihiko
    Nakaguma, Hideki
    Kozuma, Yukio
    Sugeta, Rui
    Nohara, Yasunobu
    Yamashita, Takanori
    Wakata, Yoshifumi
    Takenaka, Tomoyoshi
    Iwatani, Kazunori
    Soejima, Hidehisa
    Yoshizumi, Tomoharu
    Nakashima, Naoki
    Kamouchi, Masahiro
    LEARNING HEALTH SYSTEMS, 2024,