Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

被引:0
作者
Jarrar, Mustafa [1 ]
Khalilia, Mohammed [1 ]
Ghanem, Sana [1 ]
机构
[1] Birzeit Univ, Birzeit, Palestine
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Named Entity Recognition; Multi-Task Learning; Nested Entities; BERT; Arabic NER Corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.
引用
收藏
页码:3626 / 3636
页数:11
相关论文
共 64 条
[1]  
Abdallah Sherief, 2012, Computational Linguistics and Intelligent Text Processing. Proceedings 13th International Conference (CICLing 2012), P311, DOI 10.1007/978-3-642-28604-9_26
[2]  
Abdul-hamid A., 2010, P 2010 NAM ENT WORKS
[3]  
Abdul-Mageed M., 2021, P ACL IJCNLP 2021
[4]   Clustering Arabic Tweets for Sentiment Analysis [J].
Abuaiadah, Diab ;
Rajendran, Dileep ;
Jarrar, Mustafa .
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, :449-456
[5]  
Al-Hajj M., 2021, P INT C RECENT ADV N, P40
[6]  
Al-Hajj M., 2021, P 15 INT WORKSH SEM, P748
[7]  
Al-Qurishi M., 2021, P ICNLSP 2021 C
[8]   Effect of opioid treatment on clinical outcomes among cirrhotic patients in the United States [J].
Ali, Bilal ;
Jiang, Yu ;
Agbim, Uchenna ;
Kedia, Satish K. ;
Satapathy, Sanjaya K. ;
Barnes, Matthew ;
Maliakkal, Benedict ;
Nair, Satheesh P. ;
Eason, James D. ;
Gonzalez, Humberto C. .
CLINICAL TRANSPLANTATION, 2020, 34 (06)
[9]  
ALQADI IL, 2005, P ACL WORKSH COMP AP, P87
[10]  
[Anonymous], 2014, P EMNLP 2014 WORKSH