Machine Learning Algorithms for Crime Prediction under Indian Penal Code

被引:3
作者
Aziz R.M. [1 ]
Sharma P. [1 ]
Hussain A. [1 ]
机构
[1] VIT Bhopal University, Bhopal-Indore Highway, Kothrikalan, Sehore, M.P., Bhopal
关键词
Decision tree regression (DTR); Indian Penal Code (IPC); Mean absolute percentage error (MAPE); Natural language processing (NLP); Random forest regression (RFR); Support vector regression (SVR);
D O I
10.1007/s40745-022-00424-6
中图分类号
学科分类号
摘要
In this paper, the authors propose a data-driven approach to draw insightful knowledge from the Indian crime data. The proposed approach can be helpful for police and other law enforcement bodies in India for controlling and preventing crime region-wise. In the proposed approach different regression models are built based on different regression algorithms, viz., random forest regression (RFR), decision tree regression (DTR), multiple linear regression (MLR), simple linear regression (SLR), and support vector regression (SVR) after pre-processing the data using MySQL Workbench and R programming. These regression models can predict 28 different types of IPC cognizable crime counts and also a total number of Indian Penal Code (IPC) cognizable crime counts region-wise, state-wise, and year-wise (for all over the country) provided the desired inputs to the model. Data visualization techniques, namely, chord diagrams and map plots, are used to visualize pre-processed data (corresponding to the years 2014 to 2020) and predicted data by the relatively best regression model for the year 2022. For the chosen data, it is concluded that Random Forest Regression (RFR), which predicts total IPC cognizable crime, fits relatively the best, with a 0.96 adjusted r squared value and a MAPE value of 0.2, and among regression models predicting region-wise theft crime count, the random forest regression-based model relatively fits the best, with an adjusted R squared value of 0.96 and a MAPE value of 0.166. These regression models predict that Andhra Pradesh state will have the highest crime counts, with Adilabad district at the top, having 31,933 predicted crime counts. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.
引用
收藏
页码:379 / 410
页数:31
相关论文
共 49 条
[1]  
Gupta M., Chandra B., Gupta M.P., A framework of intelligent decision support system for Indian police, J Enterp Inf Manag, 27, 5, pp. 512-540, (2014)
[2]  
Himabindu B.L., Arora R., Prashanth N.S., Whose problem is it anyway? Crimes against women in India, Glob Health Action, 7, 1, (2014)
[3]  
Zavadzki T., de Pauli S., Kleina M., Bonat W.H., Comparing artificial neural network architectures for Brazilian stock market prediction, Ann Data Sci, 7, 4, pp. 613-628, (2020)
[4]  
Aziz R., Verma C.K., Srivastava N., A novel approach for dimension reduction of microarray, Comput Biol Chem, 71, pp. 161-169, (2017)
[5]  
Misra S., The Police System in India, Global Perspectives in Policing and Law Enforcement, (2021)
[6]  
Kassem M., Ali A., Audi M., Unemployment rate, population density and crime rate in Punjab (Pakistan): an empirical analysis, Bull Bus Econ, 8, 2, pp. 92-104, (2019)
[7]  
Shi Y., Advances in big data analytics: theory, algorithms and practices, (2022)
[8]  
Olson D.L., Shi Y., Shi Y., Introduction to business data mining, pp. 2250-2254, (2007)
[9]  
Shermila A.M., Bellarmine A.B., Santiago N., Crime data analysis and prediction of perpetrator identity using machine learning approach, 2018 2Nd International Conference on Trends in Electronics and Informatics (ICOEI), 2018, pp. 107-114, (2018)
[10]  
Musheer R.A., Verma C., Srivastava N., Novel machine learning approach for classification of high-dimensional microarray data, Soft Comput, 23, 24, pp. 13409-13421, (2019)