A Method of Named Entity Recognition for Tigrinya

被引:1
作者
Yohannes, Hailemariam Mehari [1 ]
Amagasa, Toshiyuki [2 ]
机构
[1] Univ Tsukuba, Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan
来源
APPLIED COMPUTING REVIEW | 2022年 / 22卷 / 03期
关键词
Named entity recognition; POS tagging; pre-trained language model; low-resource language; semi-supervised learning;
D O I
10.1145/3570733.3570737
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a method for Named-Entity Recognition (NER) for a low-resource language, Tigrinya, using a pre-trained language model. Tigrinya is a morphologically rich, although one of the underrepresented in the field of NLP. This is mainly due to the limited amount of annotated data available. To address this problem, we present the first publicly available datasets of NER for Tigrinya containing two versions, namely, (V1 and V2) annotated manually. The V1 and V2 datasets contain 69,309 and 40,627 tokens, respectively, where the annotations are based on the CoNLL 2003 Beginning, Inside, and Outside (BIO) tagging schema. Specifically, we develop a new pre-trained language model for Tigrinya based on RoBERTa, which we refer to as TigRoBERTa. Our model is then fine-tuned on downstream tasks on a more specific target NER and POS tasks with limited data. Finally, we further enhance the model performance by applying semi-supervised self-training using unlabeled data. The experimental results show that the method achieved 84% F1-score for NER and 92% accuracy for POS tagging, which is better than or comparable to the baseline method based on the CNN-BiLSTM-CRF.
引用
收藏
页码:56 / 68
页数:13
相关论文
共 41 条
[1]   MasakhaNER: Named Entity Recognition for African Languages [J].
Adelani, David Ifeoluwa ;
Abbott, Jade ;
Neubig, Graham ;
D'souza, Daniel ;
Kreutzer, Julia ;
Lignos, Constantine ;
Palen-Michel, Chester ;
Buzaaba, Happy ;
Rijhwani, Shruti ;
Ruder, Sebastian ;
Mayhew, Stephen ;
Azime, Israel Abebe ;
Muhammad, Shamsuddeen H. ;
Emezue, Chris Chinenye ;
Nakatumba-Nabende, Joyce ;
Ogayo, Perez ;
Anuoluwapo, Aremu ;
Gitau, Catherine ;
Mbaye, Derguene ;
Alabi, Jesujoba ;
Yimam, Seid Muhie ;
Gwadabe, Tajuddeen Rabiu ;
Ezeani, Ignatius ;
Niyongabo, Rubungo Andre ;
Mukiibi, Jonathan ;
Otiende, Verrah ;
Orife, Iroro ;
David, Davis ;
Ngom, Samba ;
Adewumi, Tosin ;
Rayson, Paul ;
Adeyemi, Mofetoluwa ;
Muriuki, Gerald ;
Anebi, Emmanuel ;
Chukwuneke, Chiamaka ;
Odu, Nkiruka ;
Wairagala, Eric Peter ;
Oyerinde, Samuel ;
Siro, Clemencia ;
Bateesa, Tobius Saul ;
Oloyede, Temilola ;
Wambui, Yvonne ;
Akinode, Victor ;
Nabagereka, Deborah ;
Katusiime, Maurice ;
Awokoya, Ayodele ;
Mboup, Mouhamadane ;
Gebreyohannes, Dibora ;
Tilaye, Henok ;
Nwaike, Kelechi .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 :1116-1131
[2]   Arabic Named Entity Recognition: A BERT-BGRU Approach [J].
Alsaaran, Norah ;
Alrabiah, Maha .
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01) :471-485
[3]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[4]  
Bojanowski P., 2017, Transactions of the association for computational linguistics, V5, P135, DOI [10.1162/tacl_a_00051, 10.1162/tacla00051, DOI 10.1162/TACL_A_00051]
[5]  
Chiu J. P., T ASSOC COMPUT LING, V4, P357
[6]  
Conneau A., 2020, PROC ACL 2020 C, P8440
[7]  
Devlin J., 2018, P C N AM CHAPT ASS C, P1
[8]   Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN [J].
Dong, Xishuang ;
Chowdhury, Shanta ;
Qian, Lijun ;
Li, Xiangfang ;
Guan, Yi ;
Yang, Jinfeng ;
Yu, Qiubin .
PLOS ONE, 2019, 14 (05)
[9]  
Eiselen R, 2016, LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3344
[10]   Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya [J].
Fesseha, Awet ;
Xiong, Shengwu ;
Emiru, Eshete Derb ;
Diallo, Moussa ;
Dahou, Abdelghani .
INFORMATION, 2021, 12 (02) :1-17