Crash narrative classification: Identifying agricultural crashes using machine learning with curated keywords

被引:3
作者
Kim, Jisung [1 ]
Trueblood, Amber Brooke [2 ]
Kum, Hye-Chung [3 ]
Shipp, Eva M. [2 ]
机构
[1] Texas A&M Transportat Inst, Mobil Div, Transportat Planning, College Stn, TX USA
[2] Texas A&M Transportat Inst, Ctr Transportat Safety, Crash Analyt Team, College Stn, TX USA
[3] Texas A&M Univ, Sch Publ Hlth, Dept Hlth Policy & Management, Populat Informat Lab, College Stn, TX USA
关键词
Machine learning; crash narratives; agricultural crashes; bag-of-words; document classification algorithms; FARM EQUIPMENT;
D O I
10.1080/15389588.2020.1836365
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objective Traditionally, structured or coded data fields from a crash report are the basis for identifying crashes involving different types of vehicles, such as farm equipment. However, using only the structured data can lead to misclassification of vehicle or crash type. The objective of the current article is to examine the use of machine learning methods for identifying agricultural crashes based on the crash narrative and to transfer the application of models to different settings (e.g., future years of data, other states). Methods Different data representations (e.g., bag-of-words [BoW], bag-of-keywords [BoK]) and document classification algorithms (e.g., support vector machine [SVM], multinomial naive Bayes classifier [MNB]) were explored using Texas and Louisiana crash narratives across different time periods. Results The BoK-support vector classifier (SVC), BoK-MNB, and BoW-SVC models trained with Texas data were better predictive models than the baseline rule-based algorithm on the future year test data, with F1 scores of 0.88, 0.89, 0.85 vs. 0.84. The BoK-MNB trained with Louisiana data performed the closest to the baseline rule-based algorithm on the future year test data (F1 scores, 0.91 baseline rule-based algorithm vs. 0.89 BoK-MNB). The BoK-SVC and BoK-MNB models trained with Texas and Louisiana data were better productive models for Texas future year test data with F1 scores 0.89 and 0.90 vs. 0.84. The BoK-MNB model trained with both states' data was a better predictive model for the Louisiana future year test data, F1 score 0.94 vs. 0.91. Conclusions The findings of this study support that machine learning methodologies can potentially reduce the amount of human power required to develop key word lists and manually review narratives.
引用
收藏
页码:74 / 78
页数:5
相关论文
共 20 条
  • [1] Exploratory analysis of automated vehicle crashes in California: A text analytics & hierarchical Bayesian heterogeneity-based approach
    Boggs, Alexandra M.
    Wali, Behram
    Khattak, Asad J.
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2020, 135
  • [2] Das S, 2021, J TRANSP SAF SECUR, V13, P605, DOI [10.1007/s10955-019-02272-w, 10.1080/19439962.2019.1658674]
  • [3] An epidemiological study of roadway fatalities related to farm vehicles: United states, 1988 to 1993
    Gerberich, SG
    Robertson, LS
    Gibson, RW
    Renier, C
    [J]. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL MEDICINE, 1996, 38 (11) : 1135 - 1140
  • [4] An empirical analysis of farm vehicle crash injury severities on Iowa's public road system
    Gkritza, Konstantina
    Kinzenbaw, Caroline R.
    Hallmark, Shauna
    Hawkins, Neal
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2010, 42 (04) : 1392 - 1397
  • [5] Prevalence of alcohol impairment and odds of a driver injury or fatality in on-road farm equipment crashes
    Harland, Karisa K.
    Bedford, Ronald
    Wu, Hongqian
    Ramirez, Marizen
    [J]. TRAFFIC INJURY PREVENTION, 2018, 19 (03) : 230 - 234
  • [6] Not just a rural occurrence: Differences in agricultural equipment crash characteristics by rural-urban crash site and proximity to town
    Harland, Karisa K.
    Greenan, Mitchell
    Ramirez, Marizen
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2014, 70 : 8 - 13
  • [7] Hughes R., 2000, CRASHES INVOLVING FA
  • [8] Lacy JK, 2003, TRANSPORT RES REC, P178
  • [9] Nayak R, 2010, P 4 WORLD C ENG ASS
  • [10] Characteristics of crashes with farm equipment that increase potential for injury
    Peek-Asa, Corinne
    Sprince, Nancy L.
    Whitten, Paul S.
    Falb, Scott R.
    Madsen, Murray D.
    Zwerling, Craig
    [J]. JOURNAL OF RURAL HEALTH, 2007, 23 (04) : 339 - 347