Biomedical Named Entity Recognition via Knowledge Guidance and Question Answering

被引:4
作者
Banerjee, Pratyay [1 ]
Pal, Kuntal kumar [1 ]
Devarakonda, Murthy [1 ]
Baral, Chitta [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, 699 S Mill Ave, Tempe, AZ 85281 USA
来源
ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE | 2021年 / 2卷 / 04期
关键词
Named entity recognition; NER; question answering; text tagging; BIO tagging; multitask training; BERT-CNN; biomedical; transfer learning;
D O I
10.1145/3465221
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this work, we formulated the named entity recognition (NER) task as a multi-answer knowledge guided question-answer task (KGQA) and showed that the knowledge guidance helps to achieve state-of-the-art results for 11 of 18 biomedical NER datasets. We prepended five different knowledge contexts-entity types, questions, definitions, and examples-to the input text and trained and tested BERT-based neural models on such input sequences from a combined dataset of the 18 different datasets. This novel formulation of the task (a) improved named entity recognition and illustrated the impact of different knowledge contexts, (b) reduced system confusion by limiting prediction to a single entity-class for each input token (i.e., B , I , O only) compared to multiple entity-classes in traditional NER (i.e., B entity 1 , B entity 2 , I entity 1 , I entity 2 , O ), (c) made detection of nested entities easier, and (d) enabled the models to jointly learn NER-specific features from a large number of datasets. We performed extensive experiments of this KGQA formulation on the biomedical datasets, and through the experiments, we showed when knowledge improved named entity recognition. We analyzed the effect of the task formulation, the impact of the different knowledge contexts, the multi-task aspect of the generic format, and the generalization ability of KGQA. We also probed the model to better understand the key contributors for these improvements.
引用
收藏
页数:24
相关论文
共 61 条
[1]  
Amith M, 2020, NATURAL LANGUAGE PROCESSING FOR MEDICAL CONVERSATIONS, P31, DOI 10.18653/v1/2020.nlpmc-1.5
[2]   Concept annotation in the CRAFT corpus [J].
Bada, Michael ;
Eckert, Miriam ;
Evans, Donald ;
Garcia, Kristin ;
Shipley, Krista ;
Sitnikov, Dmitry ;
Baumgartner, William A., Jr. ;
Cohen, K. Bretonnel ;
Verspoor, Karin ;
Blake, Judith A. ;
Hunter, Lawrence E. .
BMC BIOINFORMATICS, 2012, 13
[3]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[4]  
Ben Abacha A, 2017, IEEE INT C BIOINFORM, P2218, DOI 10.1109/BIBM.2017.8218002
[5]  
Bethard Steven, 2017, P 11 INT WORKSH SEM, P565, DOI [10.18653/v1/S17-2093, DOI 10.18653/V1/S17-2093]
[6]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[7]  
Borthwick Andrew, 1998, P 6 WORKSH VER LARG
[8]  
Chowdhuri Sanchari, 2019, AMIA Jt Summits Transl Sci Proc, V2019, P592
[9]  
Ciaramita Massimiliano, 2005, P NIPS WORKSH ADV ST, V2005
[10]  
Cohen KB, 2004, COMPU BIOL, V5, P147