SUAR: Towards Building a Corpus for the Saudi Dialect

被引:17
作者
Al-Twairesh, Nora [1 ]
Al-Matham, Rawan [1 ]
Madi, Nora [1 ]
Almugren, Nada [1 ]
Al-Aljmi, Al-Hanouf [1 ]
Alshalan, Shahad [1 ]
Alshalan, Raghad [1 ]
Alrumayyan, Nafla [1 ]
Al-Manea, Shams [1 ]
Bawazeer, Sumayah [1 ]
Al-Mutlaq, Nourah [1 ]
Almanea, Nada [1 ]
Bin Huwaymil, Waad [1 ]
Alqusair, Dalal [1 ]
Alotaibi, Reem [1 ]
Al-Senaydi, Suha [1 ]
Alfutamani, Abeer [1 ]
机构
[1] King Saud Univ, Informat Technol Dept, Riyadh, Saudi Arabia
来源
ARABIC COMPUTATIONAL LINGUISTICS | 2018年 / 142卷
关键词
Arabic NLP; Saudi Arabic; Saudi corpus; morphological annotation; Arabic dialects;
D O I
10.1016/j.procs.2018.10.462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the preliminary results of the construction of a morphologically annotated corpus for the Saudi dialect. We call the corpus SUAR (SaUdi corpus for NLP Applications and Resources). The corpus consists of around 104,079 words collected from different online sources. The linguistic features of the Saudi dialect are elaborated and compared with Modem Standard Arabic and other Arabic dialects. This paper conducts a pilot study to explore possible directions to facilitate the morphological annotation of the Saudi corpus. The corpus was automatically annotated using the MADAMIRA tool, after which it was manually inspected to validate the resulting analysis. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:72 / 82
页数:11
相关论文
共 24 条
[1]  
Al-Twairesh N, 2017, ACLING 2017
[2]  
[Anonymous], 2014, P EMNLP 2014 WORKSH
[3]   arTenTen: Arabic Corpus and Word Sketches [J].
Arts, Tressy ;
Belinkov, Yonatan ;
Habash, Nizar ;
Kilgarriff, Adam ;
Suchomel, Vit .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) :357-371
[4]  
Assiri A., 2016, International Journal of Computer and Information Engineering, V10, P272
[5]  
Bouamor H, 2018, P 11 INT C LANG RES, P10
[6]  
BUCKWALTER T., 2004, BUCKWALTER ARABIC MO
[7]  
Diab M., 2010, LREC WORKSHOP SEMITI, P66
[8]  
Diab M, 2007, ARAB COMPUT MORPHOL
[9]  
Eskander Ramy., 2016, P INT C COMP LING CO, P3455
[10]  
Habash N, 2009, P 2 INT C AR LANG RE, V41, P62, DOI DOI 10.1080/016909696386944