Annotated Corpus with Negation and Speculation in Arabic Review Domain: NSAR

被引:0
作者
Mahany, Ahmed [1 ]
Khaled, Heba [1 ]
Elmitwally, Nouh Sabri [2 ,3 ]
Aljohani, Naif [4 ]
Ghoniemy, Said [1 ]
机构
[1] Ain Shams Univ, Fac Comp & Informat Sci, Cairo 11566, Egypt
[2] Birmingham City Univ, Sch Comp & Digital Technol, Birmingham B4 7XG, England
[3] Cairo Univ, Fac Comp & Artificial Intelligence, Giza 12613, Egypt
[4] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah 21589, Saudi Arabia
关键词
Arabic NLP; negation; speculation; uncertainty; annotation; annotation guidelines; corpus; review domain; sentiment analysis;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Negation and speculation detection are critical for Natural Language Processing (NLP) tasks, such as sentiment analysis, information retrieval, and machine translation. This paper presents the first Arabic corpus in the review domain annotated with negation and speculation. The Negation and Speculation Arabic Review (NSAR) corpus consists of 3K randomly selected review sentences from three well-known and benchmarked Arabic corpora. It contains reviews from different categories, including books, hotels, restaurants, and other products written in various Arabic dialects. The negation and speculation keywords have been annotated along with their linguistic scope based on the annotation guidelines reviewed by an expert linguist. The inter-annotator agreement between two independent annotators, Arabic native speakers, is measured using the Cohen's Kappa coefficients with values of 95 and 80 for negation and speculation, respectively. Furthermore, 29% of this corpus includes at least one negation instance, while only 4% of this corpus contains speculative content. Therefore, the Arabic reviews focus more on negation structures rather than speculation. This corpus will be available for the Arabic research community to handle these critical phenomena(1).
引用
收藏
页码:38 / 46
页数:9
相关论文
共 28 条
[1]   Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis [J].
Al-Moslmi, Tareq ;
Albared, Mohammed ;
Al-Shabi, Adel ;
Omar, Nazlia ;
Abdullah, Salwani .
JOURNAL OF INFORMATION SCIENCE, 2018, 44 (03) :345-362
[2]  
Alalyani N, 2018, INT J ADV COMPUT SC, V9, P206
[3]   A comprehensive study for Arabic Sentiment Analysis (Challenges and Applications) [J].
Alsayat, Ahmed ;
Elmitwally, Nouh .
EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (01) :7-12
[4]  
Aly M., 2013, ACL, P494, DOI DOI 10.13140/2.1.3960.5761
[5]   Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis [J].
Assiri, Adel ;
Emam, Ahmed ;
Al-Dossari, Hmood .
JOURNAL OF INFORMATION SCIENCE, 2018, 44 (02) :184-202
[6]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[7]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[8]   A machine-learning approach to negation and speculation detection for sentiment analysis [J].
Cruz, Noa P. ;
Taboada, Maite ;
Mitkov, Ruslan .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (09) :2118-2136
[9]   Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora [J].
Dalloux, Clement ;
Claveau, Vincent ;
Grabar, Natalia ;
Oliveira, Lucas Emanuel Silva ;
Moro, Claudia Maria Cabral ;
Gumiel, Yohan Bonescki ;
Carvalho, Deborah Ribeiro .
NATURAL LANGUAGE ENGINEERING, 2021, 27 (02) :181-201
[10]  
de Castilho Richard Eckart, 2016, P WORKSH LANG TECHN, P76