Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic

被引：0

作者：

Bifari, Ezdihar ^{[1
,2
]}

Basbrain, Arwa ^{[1
]}

Mirza, Rsha ^{[1
]}

Bafail, Alaa ^{[1
]}

Albaeadie, Somayah ^{[1
]}

Alhalabi, Wadee ^{[1
,2
]}

机构：

[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia

[2] King Abdulaziz Univ, Immers Virtual Real Res Grp, Jeddah, Saudi Arabia

来源：

COGENT ENGINEERING | 2024年 / 11卷 / 01期

关键词：

Crime scene; crime classification; text mining; machine learning; unstructured data; the CAP dataset; legal documents;

D O I：

10.1080/23311916.2024.2359850

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

This paper proposes a novel approach to utilizing open-source legal databases in academic education, especially in the fields of law and police investigations. Our framework provides a way to organize and analyze this data and extract reports that are associated with crime scenes, addressing the challenge of classifying unstructured legal documents by using text mining, natural language processing, and machine learning techniques. We developed a supervised machine learning model capable of accurately classifying court documents based on two classifiers: one identifies the documents containing crime scenes, and the other classifies them into five types of crimes. The experimental results were promising, as the random forest algorithm achieved an accuracy of 91.07% for the first classifier and support vector machines achieved an accuracy of 82.46% for the second classifier. What distinguishes our work is the creation of a crime dictionary that includes 70 crime tools and 151 related terms extracted from various forensic sources. It is considered relatively small, but it contributed to giving good classification results. The proposed crime dictionary can be generalized, developed, used in advanced searches, and integrated with police databases to improve crime scene analysis. Overall, the research highlights the use of court databases in police academic education and attempts to utilize them in a more effective manner.

引用

页数：22

共 52 条

[51] Yu H., 2020, Journal of Technical Writing and Communication, V50, P35, DOI DOI 10.1177/0047281618812441
[52] Zhang Y., 2021, Scientific Programming, V2021, P14

← 1 2 3 4 5 6 →