A machine learning tool for collecting and analyzing subjective road safety data from Twitter

被引:7
作者
Abedi, Mohammad Majid [1 ]
Sacchi, Emanuele [1 ]
机构
[1] Univ Saskatchewan, Dept Civil Geol & Environm Engn, 57 Campus Dr, Saskatoon, SK S7N 5A9, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Transportation engineering; Subjective safety; Crowdsourcing; Machine learning; Natural language processing; Twitter; INCIDENT DETECTION;
D O I
10.1016/j.eswa.2023.122582
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several measures have been proposed in the road safety field to measure safety objectively. However, studies focused on measuring subjective road safety (i.e., the cognitive safety of road environment by drivers) are limited due to the lack and difficulty of collecting subjective safety data. Still, subjective safety is important and should be monitored as much as objective safety as it can negatively impact road users' mobility and road agencies draw information on it in determining policies (ultimately affecting collision occurrence). Nowadays, crowdsourcing big data related to subjective road safety can be done in social media platforms like Twitter. Therefore, this research aims at developing a tool for extracting, classifying, and studying drivers' affective states from road safety-related tweets using keyword filtering, geo-boundaries, natural language processing, and machine-learning (ML) classification. The ML classification algorithms used in this study were naive Bayes (NB), logis-tic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). Also, word count, word level TF-IDF, N-gram level TF-IDF, and character level TF-IDF were used as features. Metro Vancouver (Canada) was selected as the geographic region for tweet extraction and creation of training and test data sets. The study focused on the year 2019, where 13,226 unique tweets were extracted after removing duplicates. The performance of proposed ML models was compared by estimating accuracy, precision, recall, and F1-scores. The results showed that the trained RF model with count vector, and SVM classifier with word-level TF-IDF performed best in separating road safety-related from unrelated tweets (accuracy = 0.935, F1-score = 0.937) and determining the proposed classification tags (accuracy = 0.881, F1-score = 0.879), respectively. Finally, sentiment analysis was conducted to investigate the polarity of tweets in each group of the proposed classification to better understand drivers' affective states.
引用
收藏
页数:13
相关论文
共 51 条
[1]  
Aho Alfred V., 1991, Algorithms for Finding Patterns in Strings, P255
[2]   Twitter Analysis for Intelligent Transportation [J].
Alhumoud, Sarah .
COMPUTER JOURNAL, 2019, 62 (11) :1547-1556
[3]   Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning [J].
Alomari, Ebtesam ;
Katib, Iyad ;
Mehmood, Rashid .
MOBILE NETWORKS & APPLICATIONS, 2023, 28 (02) :603-618
[4]  
Amundsen A. H., 2003, En kunnskapsoversikt for RISIT-programmet, V622
[5]  
[Anonymous], 2023, S.M.S. Canada StatCounter Global Stats
[6]  
[Anonymous], 2010, HIGHWAY SAFETY MANUA, V1
[7]  
[Anonymous], 2021, Twitter user distribution by gender Canada Statista
[8]   Detection and prediction of traffic accidents using deep learning techniques [J].
Azhar, Anique ;
Rubab, Saddaf ;
Khan, Malik M. ;
Bangash, Yawar Abbas ;
Alshehri, Mohammad Dahman ;
Illahi, Fizza ;
Bashir, Ali Kashif .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (01) :477-493
[9]   What Is a Paraphrase? [J].
Bhagat, Rahul ;
Hovy, Eduard .
COMPUTATIONAL LINGUISTICS, 2013, 39 (03) :463-472
[10]  
Bird S., 2009, Natural language processing with Python: analyzing text with the natural language toolkit