Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic

被引:0
作者
Bifari, Ezdihar [1 ,2 ]
Basbrain, Arwa [1 ]
Mirza, Rsha [1 ]
Bafail, Alaa [1 ]
Albaeadie, Somayah [1 ]
Alhalabi, Wadee [1 ,2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia
[2] King Abdulaziz Univ, Immers Virtual Real Res Grp, Jeddah, Saudi Arabia
来源
COGENT ENGINEERING | 2024年 / 11卷 / 01期
关键词
Crime scene; crime classification; text mining; machine learning; unstructured data; the CAP dataset; legal documents;
D O I
10.1080/23311916.2024.2359850
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper proposes a novel approach to utilizing open-source legal databases in academic education, especially in the fields of law and police investigations. Our framework provides a way to organize and analyze this data and extract reports that are associated with crime scenes, addressing the challenge of classifying unstructured legal documents by using text mining, natural language processing, and machine learning techniques. We developed a supervised machine learning model capable of accurately classifying court documents based on two classifiers: one identifies the documents containing crime scenes, and the other classifies them into five types of crimes. The experimental results were promising, as the random forest algorithm achieved an accuracy of 91.07% for the first classifier and support vector machines achieved an accuracy of 82.46% for the second classifier. What distinguishes our work is the creation of a crime dictionary that includes 70 crime tools and 151 related terms extracted from various forensic sources. It is considered relatively small, but it contributed to giving good classification results. The proposed crime dictionary can be generalized, developed, used in advanced searches, and integrated with police databases to improve crime scene analysis. Overall, the research highlights the use of court databases in police academic education and attempts to utilize them in a more effective manner.
引用
收藏
页数:22
相关论文
共 52 条
  • [1] Amita R., 2021, Autopsy & forensics
  • [2] Integrating topic modeling and word embedding to characterize violent deaths
    Arseniev-Koehler, Alina
    Cochran, Susan D.
    Mays, Vickie M.
    Chang, Kai-Wei
    Foster, Jacob G.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (10)
  • [3] Individual vs. Group Violent Threats Classification in Online Discussions
    Ashraf, Noman
    Mustafa, Rabia
    Sidorov, Grigori
    Gelbukh, Alexander
    [J]. WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 629 - 633
  • [4] Smart Policing Technique With Crime Type and Risk Score Prediction Based on Machine Learning for Early Awareness of Risk Situation
    Baek, Myung-Sun
    Park, Wonjoo
    Park, Jaehong
    Jang, Kwang-Ho
    Lee, Yong-Tae
    [J]. IEEE ACCESS, 2021, 9 : 131906 - 131915
  • [5] Conceptualizing self-regulated reading-to-write in ESL/EFL writing and investigating its relationships to motivation and writing competence
    Bai, Barry
    Wang, Jing
    [J]. LANGUAGE TEACHING RESEARCH, 2023, 27 (05) : 1193 - 1216
  • [6] Unsupervised identification of crime problems from police free-text data
    Birks, Daniel
    Coleman, Alex
    Jackson, David
    [J]. CRIME SCIENCE, 2020, 9 (01)
  • [7] Bohnert M, 2006, FOREN PATHOL REV, V4, P65, DOI 10.1007/978-1-59259-921-9_3
  • [8] Borcan Marius, 2020, TF-IDF Explained and Python Sklearn Implementation
  • [9] An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing
    Carnaz, Goncalo
    Antunes, Mario
    Nogueira, Vitor Beires
    [J]. DATA, 2021, 6 (07)
  • [10] Castano S., 2020, 1 INT WORKSH CAISE L, V2690, P15