The Construction of a New Lexicon Design for Arabic Language

被引:0
作者
Bataineh, Bilal [1 ]
Bataineh, Emad [1 ]
机构
[1] Irbid Natl Univ, Irbid, Jordan
来源
BUSINESS TRANSFORMATION THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: AN ACADEMIC PERSPECTIVE, VOLS 3 AND 4 | 2010年
关键词
Natural Language Processing; Lexicon; Parser; Arabic Language;
D O I
暂无
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Analyzing Arabic sentences is a difficult task; the difficulties come from several sources. One is that sentences are long and complex, the other difficulties come from the sentence structure. The syntactic structure of sentence parts may be missing, taking into accounts different orders of words and phrases. This paper aims to develop and assess an Arabic Lexicon. The new automatic Lexicon was developed with the purpose of analyzing and extracting the attributes of Arabic words. The lexicon was implemented using two-step process, tokenization and part of speech tagging. The output of the lexicon can be processed by another parser tool which perform an analysis on Arabic sentence to determines if the sentence follows a valid grammatical structure. An evaluation test was conducted to assess the effectiveness and efficiency of the new lexicon design using real sentences taken randomly. The results have shown a minimum accuracy rate of 92% which is considered highly satisfactory. The newly designed lexicon can be widely used for any application that requires Arabic Language analysis and processing.
引用
收藏
页码:2086 / 2096
页数:11
相关论文
共 10 条
  • [1] Al-Shalabi R., 2004, ACIT 2003 EG
  • [2] ALSHALABT R, 2003, INT AR C INF TECHN A, P42
  • [3] ASLAM J, 2003, ACM SIGIR FORUM, V37, P31
  • [4] Dafydd G., LEXICOGRAPHY LEXICOL
  • [5] Grishman R, 1997, LEXICONS SURVEY STAT
  • [6] Istek O., 2006, THESIS BILKENT U
  • [7] Jorg T, 1997, THESIS OTTO VONGUERI
  • [8] KANAAN G, 2003, P INT C INF TECHN NA, P258
  • [9] Kling R., 2003, ENCY LIB INFORM SCI, P2656
  • [10] PLACEWAY P, 2002, CMULTI02172