Named Entity Recognition and Relation Extraction for COVID-19: Explainable Active Learning with Word2vec Embeddings and Transformer-Based BERT Models

被引:2
|
作者
Arguello-Casteleiro, M. [1 ]
Maroto, N. [2 ]
Wroe, C. [3 ]
Torrado, C. Sevillano [4 ]
Henson, C. [5 ]
Des-Diz, J. [4 ]
Fernandez-Prieto, M. J. [6 ]
Furmston, T. [1 ]
Fernandez, D. Maseda [5 ]
Kulshrestha, M. [5 ]
Stevens, R. [1 ]
Keane, J. [1 ]
Peters, S. [1 ]
机构
[1] Univ Manchester, Manchester, Lancs, England
[2] Univ Politecn Madrid, Madrid, Spain
[3] BMJ, London, England
[4] Hosp Salnes, Pontevedra, Spain
[5] Midcheshire Hosp Fdn Trust, Crewe, England
[6] Univ Salford, Salford, Lancs, England
来源
ARTIFICIAL INTELLIGENCE XXXVIII | 2021年 / 13101卷
关键词
Deep learning for natural language processing; Transfer learning; Embeddings; Transformer-based models; Explainable active learning;
D O I
10.1007/978-3-030-91100-3_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning for natural language processing acquires dense vector representations for n-grams from large-scale unstructured corpora. Converting static embeddings of n-grams into a dataset of interlinked concepts with explicit contextual semantic dependencies provides the foundation to acquire reusable knowledge. However, the validation of this knowledge requires cross-checking with ground-truths that may be unavailable in an actionable or computable form. This paper presents a novel approach from the new field of explainable active learning that combines methods for learning static embeddings (word2vec models) with methods for learning dynamic contextual embeddings (transformer-based BERT models). We created a dataset for named entity recognition (NER) and relation extraction (REX) for the Coronavirus Disease 2019 (COVID-19). The COVID-19 dataset has 2,212 associations captured by 11 word2vec models with additional examples of use from the biomedical literature. We propose interpreting the NER and REX tasks for COVID-19 as Question Answering (QA) incorporating general medical knowledge within the question, e.g. "does 'cough' (n-gram) belong to 'clinical presentation/symptoms' for COVID-19?". We evaluated biomedicalspecific pre-trained language models (BioBERT, SciBERT, ClinicalBERT, BlueBERT, and PubMedBERT) versus general-domain pre-trained language models (BERT, and RoBERTa) for transfer learning with COVID-19 dataset, i.e. taskspecific fine-tuning considering NER as a sequence-level task. Using 2,060 QA for training (associations from 10 word2vec models) and 152 QA for validation (associations from 1 word2vec model), BERT obtained an F-measure of 87.38%, with precision = 93.75% and recall = 81.82%. SciBERT achieved the highest F-measure of 94.34%, with precision = 98.04% and recall = 90.91%.
引用
收藏
页码:158 / 163
页数:6
相关论文
共 8 条
  • [1] Chinese Named Entity Extraction System Based On Word2vec Under Spark Platform
    Yuan, Jialu
    Xiong, Yongping
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING (AMITP 2016), 2016, 60 : 387 - 394
  • [2] Chinese Named Entity Recognition of Epidemiological Investigation of Information on COVID-19 Based on BERT
    Yang, Chongluo
    Sheng, Long
    Wei, Zhongcheng
    Wang, Wei
    IEEE ACCESS, 2022, 10 : 104156 - 104168
  • [3] An Explainable Fake News Detector Based on Named Entity Recognition and Stance Classification Applied to COVID-19
    De Magistris, Giorgio
    Russo, Samuele
    Roma, Paolo
    Starczewski, Janusz T.
    Napoli, Christian
    INFORMATION, 2022, 13 (03)
  • [4] TBR-NER: Research on COVID-19 Text Information Extraction Based on Joint Learning of Topic Recognition and Named Entity Recognition
    Feng, Xin
    Li, Yingrui
    Hang, Zhang
    Fan, Zhang
    Yu, Qiong
    Xin, Ruihao
    JOURNAL OF SENSORS, 2022, 2022
  • [5] Word2vec neural model-based techniqueto generate protein vectors for combating COVID-19: a machine learning approach
    Adjuik T.A.
    Ananey-Obiri D.
    International Journal of Information Technology, 2022, 14 (7) : 3291 - 3299
  • [6] Stock market reactions to the COVID-19 pandemic: The moderating role of corporate big data strategies based on Word2Vec*
    Xue, Fujing
    Li, Xiaoyu
    Zhang, Ting
    Hu, Nan
    PACIFIC-BASIN FINANCE JOURNAL, 2021, 68
  • [7] Identification of Asymptomatic COVID-19 Patients on Chest CT Images Using Transformer-Based or Convolutional Neural Network–Based Deep Learning Models
    Minyue Yin
    Xiaolong Liang
    Zilan Wang
    Yijia Zhou
    Yu He
    Yuhan Xue
    Jingwen Gao
    Jiaxi Lin
    Chenyan Yu
    Lu Liu
    Xiaolin Liu
    Chao Xu
    Jinzhou Zhu
    Journal of Digital Imaging, 2023, 36 : 827 - 836
  • [8] Identification of Asymptomatic COVID-19 Patients on Chest CT Images Using Transformer-Based or Convolutional Neural Network-Based Deep Learning Models
    Yin, Minyue
    Liang, Xiaolong
    Wang, Zilan
    Zhou, Yijia
    He, Yu
    Xue, Yuhan
    Gao, Jingwen
    Lin, Jiaxi
    Yu, Chenyan
    Liu, Lu
    Liu, Xiaolin
    Xu, Chao
    Zhu, Jinzhou
    JOURNAL OF DIGITAL IMAGING, 2023, 36 (03) : 827 - 836