Crime profiling for the Arabic language using computational linguistic techniques

被引:11
作者
Alruily, Meshrif [1 ]
Ayesh, Aladdin [2 ]
Zedan, Hussein [2 ]
机构
[1] Al Jouf Univ, Al Jouf, Sakaka, Saudi Arabia
[2] De Montfort Univ, Software Technol Res Lab, Leicester LE1 9BH, Leics, England
关键词
Arabic language; Crime domain; Pattern recognition; Clustering; Information extraction; Syntactic analysis; NAMED ENTITY RECOGNITION; EXTRACTION; EVENTS; SYSTEM;
D O I
10.1016/j.ipm.2013.09.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is a widely spoken language but few mining tools have been developed to process Arabic text. This paper examines the crime domain in the Arabic language (unstructured text) using text mining techniques. The development and application of a Crime Profiling System (CPS) is presented. The system is able to extract meaningful information, in this case the type of crime, location and nationality, from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction that depends on local grammar, and secondly, dictionaries that can be automatically generated. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering of the crime reports, based on crime type. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data are cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; it was not used during system development. Precision, recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems. In order to evaluate the clustering performance, three parameters are used: data size, loading time and quantization error. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:315 / 341
页数:27
相关论文
共 68 条
  • [1] AbdelRahman S., 2010, INT J COMPUTER SCI I, V7, P27
  • [2] Abdul-Hamid A.Darwish., 2010, P NAMED ENTITIES WOR, P110
  • [3] Abuleil S, 2004, PROC INT C TOOLS ART, P769
  • [4] Using NLP techniques for tagging events in Arabic text
    Abuleil, Saleem.
    [J]. 19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 440 - 443
  • [5] AI-Jarf R., 1990, CONTRASTIVE ANAL ENG
  • [6] Al-Marghilani A., 2008, THESIS DE MONTFORT U
  • [7] Al-Shalabi Riyad, 2009, INT C IT THAIL
  • [8] Al-shatnawi A. M., 2008, ARAB RES I SCI ENG A, V4, P158
  • [9] Alahmadi M. M. A., 1986, DAR ALALM LELMALAYYN
  • [10] Almas Y., 2006, PROC COLINGACL 06 WO, P56