IMPROVING BIOMEDICAL NAMED ENTITY RECOGNITION WITH A UNIFIED MULTI-TASK MRC FRAMEWORK

被引:5
作者
Tong, Yiqi [1 ,2 ]
Zhuang, Fuzhen [1 ,2 ]
Wang, Deqing [2 ]
Ying, Haochao [3 ]
Wang, Binling [4 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci, SKLSDE, Beijing 100191, Peoples R China
[3] Zhejiang Univ, Sch Publ Hlth, Hangzhou 310058, Peoples R China
[4] Xiamen Univ, Sch Informat, Xiamen 361005, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
中国国家自然科学基金;
关键词
Prior knowledge; Biomedical named entity recognition; Machine reading comprehension; Multi-task learning;
D O I
10.1109/ICASSP43922.2022.9746482
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The prior knowledge, such as expert rules and knowledge base, has been proven effective in the traditional Biomedical Named Entity Recognition (BioNER). Most current neural BioNER systems use this external knowledge for preprocessing or post-editing instead of incorporate it into the training process, which cannot be learned by the model. To encode prior knowledge into the model, we present a unified multi-task Machine Reading Comprehension (MRC) framework for BioNER. Specifically, in the MRC task, the question sequences are derived from the standard BioNER dataset. We introduce three kinds of prior knowledge at query sequences, including Wikipedia, annotation scheme, entity dictionary. Then, our model adopts a multi-task learning strategy to joint training the main task BioNER and the auxiliary task MRC. Finally, experimental results on three benchmark datasets validate the superiority of our BioNER model compared with various state-of-the-art baselines.
引用
收藏
页码:8332 / 8336
页数:5
相关论文
共 19 条
[1]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]   NCBI disease corpus: A resource for disease name recognition and concept normalization [J].
Dogan, Rezarta Islamaj ;
Leaman, Robert ;
Lu, Zhiyong .
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 47 :1-10
[4]  
Gu Y., 2020, Domain-specific language model pretraining for biomedical natural language processing
[5]   Deep learning with word embeddings improves biomedical named entity recognition [J].
Habibi, Maryam ;
Weber, Leon ;
Neves, Mariana ;
Wiegandt, David Luis ;
Leser, Ulf .
BIOINFORMATICS, 2017, 33 (14) :I37-I48
[6]  
Khan Muhammad Raza, 2020, ARXIV200108904
[7]   BioBERT: a pre-trained biomedical language representation model for biomedical text mining [J].
Lee, Jinhyuk ;
Yoon, Wonjin ;
Kim, Sungdong ;
Kim, Donghyeon ;
Kim, Sunkyu ;
So, Chan Ho ;
Kang, Jaewoo .
BIOINFORMATICS, 2020, 36 (04) :1234-1240
[8]   BioCreative V CDR task corpus: a resource for chemical disease relation extraction [J].
Li, Jiao ;
Sun, Yueping ;
Johnson, Robin J. ;
Sciaky, Daniela ;
Wei, Chih-Hsuan ;
Leaman, Robert ;
Davis, Allan Peter ;
Mattingly, Carolyn J. ;
Wiegers, Thomas C. ;
Lu, Zhiyong .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
[9]  
Li X., 2019, ARXIV191011476
[10]   An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition [J].
Luo, Ling ;
Yang, Zhihao ;
Yang, Pei ;
Zhang, Yin ;
Wang, Lei ;
Lin, Hongfei ;
Wang, Jian .
BIOINFORMATICS, 2018, 34 (08) :1381-1388