Language Clustering for Multilingual Named Entity Recognition

被引:0
|
作者
Shaffer, Kyle [1 ]
机构
[1] Language Weaver RWS Grp, Gerrards Cross, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work in multilingual natural language processing has shown progress in various tasks such as natural language inference and joint multilingual translation. Despite success in learning across many languages, challenges arise where multilingual training regimes often boost performance on some languages at the expense of others. For multilingual named entity recognition (NER) we propose a simple technique that groups similar languages together by using embeddings from a pre-trained masked language model, and automatically discovering language clusters in this embedding space. Specifically, we fine-tune an XLM-Roberta model on a language identification task, and use embeddings from this model for clustering. We conduct experiments on 15 diverse languages in the WikiAnn dataset and show our technique largely outperforms three baselines: (1) training a multilingual model jointly on all available languages, (2) training one monolingual model per language, and (3) grouping languages by linguistic family. We also conduct analyses showing meaningful multilingual transfer for low-resource languages (Swahili and Yoruba), despite being automatically grouped with other seemingly disparate languages.
引用
收藏
页码:40 / 45
页数:6
相关论文
共 50 条
  • [1] Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition
    Imai, Sakura
    Kawahara, Daisuke
    Orita, Naho
    Oda, Hiromune
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 139 - 151
  • [2] On the Strength of Character Language Models for Multilingual Named Entity Recognition
    Yu, Xiaodong
    Mayhew, Stephen
    Sammons, Mark
    Roth, Dan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3073 - 3077
  • [3] Multilingual Transformers for Named Entity Recognition
    Viksna, Rinalds
    Skadin, Inguna
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 457 - 469
  • [4] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [5] Named Entity Recognition an Aid to Improve Multilingual Entity Filling In Language-Independent Approach
    Bhagavatula, Mahathi
    Santosh, G. S. K.
    Varma, Vasudeva
    PROCEEDINGS OF THE FIRST WORKSHOP ON INFORMATION AND KNOWLEDGE MANAGEMENT FOR DEVELOPING REGION, 2012, : 3 - 9
  • [6] Using WordNet Predicates for Multilingual Named Entity Recognition
    Negri, Matteo
    Magnini, Bernardo
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 169 - 174
  • [7] Adaptive, multilingual named entity recognition in Web pages
    Petasis, G
    Karkaletsis, V
    Grover, C
    Hachey, B
    Pazienza, MT
    Vindigni, M
    Coch, J
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 1073 - 1074
  • [8] Learning multilingual named entity recognition from Wikipedia
    Nothman, Joel
    Ringland, Nicky
    Radford, Will
    Murphy, Tara
    Curran, James R.
    ARTIFICIAL INTELLIGENCE, 2013, 194 : 151 - 175
  • [9] Multilingual Fine-Grained Named Entity Recognition
    Lupancu, Viorica-Camelia
    Iftene, Adrian
    COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2023, 31 (03) : 321 - 339
  • [10] Named Entity Recognition for Mongolian Language
    Munkhjargal, Zoljargal
    Bella, Gabor
    Chagnaa, Altangerel
    Giunchiglia, Fausto
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 243 - 251