Towards a Unified Multi-Domain Multilingual Named Entity Recognition Model

被引:0
作者
Kulkarni, Mayank [2 ]
Preotiuc-Pietro, Daniel [1 ]
Radhakrishnan, Karthik [1 ]
Winata, Genta Indra [1 ]
Wu, Shijie [1 ]
Xie, Lingjue [1 ]
Yang, Shaohua [1 ]
机构
[1] Bloomberg, New York, NY 10022 USA
[2] Amazon Alexa AI, Boston, MA USA
来源
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition is a key Natural Language Processing task whose performance is sensitive to choice of genre and language. A unified NER model across multiple genres and languages is more practical and efficient through leveraging commonalities across genres or languages. In this paper, we propose a novel setup for NER which includes multi-domain and multilingual training and evaluation across 13 domains and 4 languages. We explore a range of approaches to building a unified model using domain and language adaptation techniques. Our experiments highlight multiple nuances to consider while building a unified model, including that naive data pooling fails to obtain good performance, that domain-specific adaptations are more important than language-specific ones and that including domain-specific adaptations in a unified model can reach performance close to training multiple dedicated monolingual models at a fraction of their parameter count.
引用
收藏
页码:2210 / 2219
页数:10
相关论文
共 50 条
[31]   MMBERT: a unified framework for biomedical named entity recognition [J].
Fu, Lei ;
Weng, Zuquan ;
Zhang, Jiheng ;
Xie, Haihe ;
Cao, Yiqing .
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (01) :327-341
[32]   Named Entity Recognition in the Domain of Geographical Subject [J].
Xu, Feifei ;
Li, Huiying ;
Li, Xuelian .
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, :2229-2234
[33]   A framework for Named Entity Recognition in the Open domain [J].
Evans, RJ .
RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 :267-276
[34]   Named Entity Recognition System for the Biomedical Domain [J].
Sharma, Raghav ;
Chauhan, Deependra ;
Sharma, Raksha .
PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, :837-840
[35]   Named Entity Recognition in a Very Homogeneous Domain [J].
Agarwal, Oshin ;
Nenkova, Ani .
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, :1850-1855
[36]   Medical Named Entity Recognition with Domain Knowledge [J].
Pei W. ;
Sun S. ;
Li X. ;
Lu J. ;
Yang L. ;
Wu Y. .
Data Analysis and Knowledge Discovery, 2023, 7 (03) :142-154
[37]   Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts [J].
Zhang, Jiangtao ;
Li, Juanzi ;
Wang, Shuai ;
Zhang, Yan ;
Cao, Yixin ;
Hou, Lei ;
Li, Xiao-Li .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 :275-287
[38]   IMPROVING BIOMEDICAL NAMED ENTITY RECOGNITION WITH A UNIFIED MULTI-TASK MRC FRAMEWORK [J].
Tong, Yiqi ;
Zhuang, Fuzhen ;
Wang, Deqing ;
Ying, Haochao ;
Wang, Binling .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :8332-8336
[39]   A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers [J].
Hamdi, Ahmed ;
Pontes, Elvys Linhares ;
Boros, Emanuela ;
Thi Tuyet Hai Nguyen ;
Hackl, Guenter ;
Moreno, Jose G. ;
Doucet, Antoine .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :2328-2334
[40]   Informal Multilingual Multi-domain Sentiment Analysis [J].
Stajner, Tadej ;
Novalija, Inna ;
Mladenic, Dunja .
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2013, 37 (04) :373-380