Mining Twitter data for crime trend prediction

被引:15
作者
Aghababaei, Somayyeh [1 ]
Makrehchi, Masoud [1 ]
机构
[1] Univ Ontario Inst Technol, Dept Elect Comp & Software Engn, Oshawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Twitter data analytics; temporal data analytics; text mining; topic modeling; sentiment analysis; social trend prediction;
D O I
10.3233/IDA-163183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While conventional crime prediction methods rely on historical crime records and geographical information of the location of interest, we pursue the question of whether a social media context can provide socio-behavior "signals" for a crime prediction problem. The hypothesis is that crowd publicly available data in Twitter may include predictive variables which can indicate changes in crime rates without being only limited to the availability of historical crime records of specific locations. We developed a prediction model for crime trend prediction, where the objective is to employ Twitter content to predict crime rate directions in a prospective time-frame. The model employs content, sentiment, and topics, as the predictive indicators to infer the changes of crime indexes. Since our problem has a sequential order, we propose a temporal topic detection model to infer predictive topics over time. The main challenge of topic detection over time is information evolution, in which data are more related when they are close in time rather than further apart. Our proposed topic detection model builds a dynamic vocabulary to detect emerging topics rather than considering a vocabulary in bulk. We applied our model on data collected from Chicago for crime trend prediction using historical tweets. The results have revealed the correlation between features extracted from the content as content-based features and the crime trends. Moreover, the results indicate the feasibility of our proposed temporal topic detection model in identifying the most predictive features over time compared to a static model without time consideration. We also studied the contribution of socio-economic indexes and temporal features as auxiliary features. The experiment shows the content-based features improve the prediction performance significantly compared to the auxiliary features. Overall, the study provides a deep insight into the correlation between language and crime trends and the impact of social data as an extra resource in providing predictive indicators.
引用
收藏
页码:117 / 141
页数:25
相关论文
共 49 条
[1]  
Abrahamsen David., 1960, The Psychology of Crime
[2]  
Achrekar Harshavardhan, 2012, Proceedings of the International Conference on Health Informatics. HEALTHINF 2012, P61
[3]   On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [J].
AlSumait, Loulwah ;
Barbara, Daniel ;
Domeniconi, Carlotta .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :3-12
[5]  
[Anonymous], 2012, P COLING 2012
[6]  
[Anonymous], ARXIV12033463
[7]  
[Anonymous], 2011, P INT AAAI C WEB SOC
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
Blum A., 1999, Proceedings of the Twelfth Annual Conference on Computational Learning Theory, P203, DOI 10.1145/307400.307439
[10]  
Bogomolov A., 2014, P 16 INT C MULT INT, P427, DOI DOI 10.1145/2663204.2663254