DAWQAS: A Dataset for Arabic Why Question Answering System

被引:21
作者
Ismail, Walaa Saber [1 ]
Homsi, Masun Nabhan [2 ]
机构
[1] Emirates Coll Technol, Abu Dhabi, U Arab Emirates
[2] Univ Simon Bolivar, Caracas, Venezuela
来源
ARABIC COMPUTATIONAL LINGUISTICS | 2018年 / 142卷
关键词
DAWQAS; Why Question Answering System; Natural Language Processing; Machine Learning; Rhetorical Relations; Discourse Markers;
D O I
10.1016/j.procs.2018.10.467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A why question answering system is a tool designed to answer why-questions posed in natural language. Several papers have been published on the problem of answering why-questions. In particular, attempts have been made to analyze Arabic text and predict which passages are best candidates for the why-questions; employing different datasets with limited size and not publicly available. To overcome these limitations, this paper introduces the new publicly available dataset, DAWQAS: Dataset for Arabic Why Question Answering System. It consists of 3205 of why question-answer pairs that were first scraped from public Arabic websites, then texts were preprocessed and converted to feature vectors. Afterwards, why-answers were re-categorized based on their domains. Finally, the rhetorical relations' probabilities based on discourse markers were computed for each sentence in the dataset. DAWQAS is a valuable resource for research and evaluation in language understanding. The new dataset is freely available at https://github.com/masun/DAWQAS. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:123 / 131
页数:9
相关论文
共 18 条
[1]  
Ahmed W., 2016, Int J Comput Eng Res, V12, P18
[2]  
Akour Mohammed, 2011, American Journal of Applied Sciences, V8, P652, DOI 10.3844/ajassp.2011.652.661
[3]  
Al-Harbi S., 2008, AUTOMATIC ARABIC TEX
[4]  
[Anonymous], THESIS
[5]  
[Anonymous], 2012, INTRO REGULAR EXPRES
[6]  
[Anonymous], COMP INT MOD CONTR A
[7]  
[Anonymous], WORLD COMPUTER SCI I
[8]   LEMAZA: An Arabic why-question answering system* [J].
Azmi, Aqil M. ;
Alshenaifi, Nouf A. .
NATURAL LANGUAGE ENGINEERING, 2017, 23 (06) :877-903
[9]   Answering Arabic Why-Questions: Baseline vs. RST-Based Approach [J].
Azmi, Aqil M. ;
Alshenaifi, Nouf A. .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2016, 35 (01)
[10]  
El Kourdi Mohamed., 2004, Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, P51