Interpretable machine learning models for crime prediction

被引:55
作者
Zhang, Xu [1 ,2 ]
Liu, Lin [2 ,3 ]
Lan, Minxuan [4 ]
Song, Guangwen [2 ]
Xiao, Luzi [2 ]
Chen, Jianguo [2 ]
机构
[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China
[2] Guangzhou Univ, Ctr Geoinformat Publ Secur, Sch Geog Sci, Guangzhou, Peoples R China
[3] Univ Cincinnati, Dept Geog, Cincinnati, OH USA
[4] Univ Findlay, Dept Justice Sci, Findlay, OH USA
关键词
Crime prediction; Machine learning; XGBoost; Model interpretability; SHAP value; HOT-SPOTS; IMPACT; CRIMINOLOGY; PREVENTION; TRENDS;
D O I
10.1016/j.compenvurbsys.2022.101789
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The relationship between crime patterns and associated variables has drawn a lot of attention. These variables play a critical role in crime prediction. While traditional regression models are capable of revealing the contribution of the variables, they are not optimal for crime prediction. In contrast, machine learning models are more effective for crime prediction, but most of them cannot estimate the contribution of each individual variable. This study aims to overcome this limitation by taking advantage of the interpretability of advanced machine learning models. Based on the routine activity theory and crime pattern theory, this study selects 17 variables for the crime prediction. The XGBoost algorithm is adopted to train the prediction model. A post-hoc interpretable method, Shapley additive explanation (SHAP), is used to discern the contribution of individual variables. A variable with a higher SHAP value has a higher contribution to the crime prediction model. In addition to the global model for the entire area, a local model is calibrated at each study unit, revealing the spatial variation of the variables' unique contributions. Among all 17 variables used in this model, the proportion of the non-local population and the ambient population aged 25-44 contribute more than other variables in predicting crime. The more the ambient population aged 25-44 in the area, the more the public thefts. Additionally, local SHAP values are mapped to demonstrate each variable's contribution to the crime prediction model across the study area. The results of the local models can help the police tackle the most important factors at each location, while the global model identifies the important factors for the entire region.
引用
收藏
页数:9
相关论文
共 73 条
[1]  
Alvarez-Melis D, 2018, ADV NEUR IN, V31
[2]   Crime prediction through urban metrics and statistical learning [J].
Alves, Luiz G. A. ;
Ribeiro, Haroldo, V ;
Rodrigues, Francisco A. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 505 :435-443
[3]   The Ambient Population and Crime Analysis [J].
Andresen, Martin A. .
PROFESSIONAL GEOGRAPHER, 2011, 63 (02) :193-212
[4]  
[Anonymous], 2016, P 5 INT C OP RES ENT
[5]  
[Anonymous], 2013, HYPERVALENT IODINE C, DOI DOI 10.1002/9781118341155
[6]  
[Anonymous], 1993, ONCE BITTEN TWICE BI
[7]   Middle-Level Features for the Explanation of Classification Systems by Sparse Dictionary Methods [J].
Apicella, A. ;
Isgro, F. ;
Prevete, R. ;
Tamburrini, G. .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2020, 30 (08)
[8]   "Soft" policing at hot spots-do police community support officers work? A randomized controlled trial [J].
Ariel, Barak ;
Weinborn, Cristobal ;
Sherman, Lawrence W. .
JOURNAL OF EXPERIMENTAL CRIMINOLOGY, 2016, 12 (03) :277-317
[9]   Fast Food Restaurants and Convenience Stores: Using Sales Volume to Explain Crime Patterns in Seattle [J].
Askey, Amber Perenzin ;
Taylor, Ralph ;
Groff, Elizabeth ;
Fingerhut, Aaron .
CRIME & DELINQUENCY, 2018, 64 (14) :1836-1857
[10]  
Baehrens D, 2010, J MACH LEARN RES, V11, P1803