Arabic natural language processing: An overview

被引:73
作者
Guellil, Imane [1 ]
Saadane, Houda [2 ]
Azouaou, Faical [1 ]
Gueni, Billel [3 ]
Nouvel, Damien [4 ]
机构
[1] Ecole Natl Super Informat, Lab Methodes Concept Syst, BP 68M, Oued Smar 16309, Alger, Algeria
[2] GEOLSemantics, 12 Ave Raspail, F-94250 Gentilly, France
[3] Responsable Rech & Innovat Altran IT, Paris, France
[4] Inalco, Paris, France
关键词
Arabic; MSA; AD; CA; Arabizi; Basic analysis; Identification; Building Resources; Machine translation; Sentiment analysis; Transliteration; CORPUS;
D O I
10.1016/j.jksuci.2019.02.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is recognised as the 4th most used language of the Internet. Arabic has three main varieties: (1) classical Arabic (CA), (2) Modern Standard Arabic (MSA), (3) Arabic Dialect (AD). MSA and AD could be written either in Arabic or in Roman script (Arabizi), which corresponds to Arabic written with Latin letters, numerals and punctuation. Due to the complexity of this language and the number of corresponding challenges for NLP, many surveys have been conducted, in order to synthesise the work done on Arabic. However these surveys principally focus on two varieties of Arabic (MSA and AD, written in Arabic letters only), they are slightly old (no such survey since 2015) and therefore do not cover recent resources and tools. To bridge the gap, we propose a survey focusing on 90 recent research papers (74% of which were published after 2015). Our study presents and classifies the work done on the three varieties of Arabic, by concentrating on both Arabic and Arabizi, and associates each work to its publicly available resources whenever available. (c) 2019 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:497 / 507
页数:11
相关论文
共 116 条
  • [1] A novel robust Arabic light stemmer
    Abainia, Kheireddine
    Ouamour, Siham
    Sayoud, Halim
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (03) : 557 - 573
  • [2] Abdelali A., 2016, P 2016 C N AM CHAPT, VVolume 2016, P11, DOI 10.18653/v1/N16-3003
  • [3] Abdul-Mageed M., 2016, SANA LARGE SCALE MUL
  • [4] Abdul-Mageed M, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3907
  • [5] Abidi K., 2017, 18 ANN C INT COMM AS
  • [6] Adeleke A.O., 2017, INT J ADV SCI ENG IN, V7, P1419, DOI DOI 10.18517/IJASEIT.7.4.2198
  • [7] Al-Badrashiny M., 2014, P 18 C COMPUTATIONAL, P30
  • [8] Al-Kabi M.N., 2013, NOORIC 2013 TAIB U I
  • [9] Al-Shargi F., 2016, 10 LANG RES EV C LRE
  • [10] Al-Shargi F., 2015, P WORKSH AR NAT LANG, P49