Named Entity Recognition and Relation Extraction for COVID-19: Explainable Active Learning with Word2vec Embeddings and Transformer-Based BERT Models

被引：2

作者：

Arguello-Casteleiro, M. ^{[1
]}

Maroto, N. ^{[2
]}

Wroe, C. ^{[3
]}

Torrado, C. Sevillano ^{[4
]}

Henson, C. ^{[5
]}

Des-Diz, J. ^{[4
]}

Fernandez-Prieto, M. J. ^{[6
]}

Furmston, T. ^{[1
]}

Fernandez, D. Maseda ^{[5
]}

Kulshrestha, M. ^{[5
]}

Stevens, R. ^{[1
]}

Keane, J. ^{[1
]}

Peters, S. ^{[1
]}

机构：

[1] Univ Manchester, Manchester, Lancs, England

[2] Univ Politecn Madrid, Madrid, Spain

[3] BMJ, London, England

[4] Hosp Salnes, Pontevedra, Spain

[5] Midcheshire Hosp Fdn Trust, Crewe, England

[6] Univ Salford, Salford, Lancs, England

来源：

ARTIFICIAL INTELLIGENCE XXXVIII | 2021年 / 13101卷

关键词：

Deep learning for natural language processing; Transfer learning; Embeddings; Transformer-based models; Explainable active learning;

D O I：

10.1007/978-3-030-91100-3_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning for natural language processing acquires dense vector representations for n-grams from large-scale unstructured corpora. Converting static embeddings of n-grams into a dataset of interlinked concepts with explicit contextual semantic dependencies provides the foundation to acquire reusable knowledge. However, the validation of this knowledge requires cross-checking with ground-truths that may be unavailable in an actionable or computable form. This paper presents a novel approach from the new field of explainable active learning that combines methods for learning static embeddings (word2vec models) with methods for learning dynamic contextual embeddings (transformer-based BERT models). We created a dataset for named entity recognition (NER) and relation extraction (REX) for the Coronavirus Disease 2019 (COVID-19). The COVID-19 dataset has 2,212 associations captured by 11 word2vec models with additional examples of use from the biomedical literature. We propose interpreting the NER and REX tasks for COVID-19 as Question Answering (QA) incorporating general medical knowledge within the question, e.g. "does 'cough' (n-gram) belong to 'clinical presentation/symptoms' for COVID-19?". We evaluated biomedicalspecific pre-trained language models (BioBERT, SciBERT, ClinicalBERT, BlueBERT, and PubMedBERT) versus general-domain pre-trained language models (BERT, and RoBERTa) for transfer learning with COVID-19 dataset, i.e. taskspecific fine-tuning considering NER as a sequence-level task. Using 2,060 QA for training (associations from 10 word2vec models) and 152 QA for validation (associations from 1 word2vec model), BERT obtained an F-measure of 87.38%, with precision = 93.75% and recall = 81.82%. SciBERT achieved the highest F-measure of 94.34%, with precision = 98.04% and recall = 90.91%.

引用

页码：158 / 163

页数：6

共 8 条

[1] Chinese Named Entity Extraction System Based On Word2vec Under Spark Platform
Yuan, Jialu
Xiong, Yongping
PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING (AMITP 2016), 2016, 60 : 387 - 394
[2] Chinese Named Entity Recognition of Epidemiological Investigation of Information on COVID-19 Based on BERT
Yang, Chongluo
Sheng, Long
Wei, Zhongcheng
Wang, Wei
IEEE ACCESS, 2022, 10 : 104156 - 104168
[3] An Explainable Fake News Detector Based on Named Entity Recognition and Stance Classification Applied to COVID-19
De Magistris, Giorgio
Russo, Samuele
Roma, Paolo
Starczewski, Janusz T.
Napoli, Christian
INFORMATION, 2022, 13 (03)
[4] TBR-NER: Research on COVID-19 Text Information Extraction Based on Joint Learning of Topic Recognition and Named Entity Recognition
Feng, Xin
Li, Yingrui
Hang, Zhang
Fan, Zhang
Yu, Qiong
Xin, Ruihao
JOURNAL OF SENSORS, 2022, 2022
[5] Word2vec neural model-based techniqueto generate protein vectors for combating COVID-19: a machine learning approach
Adjuik T.A.
Ananey-Obiri D.
International Journal of Information Technology, 2022, 14 (7) : 3291 - 3299
[6] Stock market reactions to the COVID-19 pandemic: The moderating role of corporate big data strategies based on Word2Vec*
Xue, Fujing
Li, Xiaoyu
Zhang, Ting
Hu, Nan
PACIFIC-BASIN FINANCE JOURNAL, 2021, 68
[7] Identification of Asymptomatic COVID-19 Patients on Chest CT Images Using Transformer-Based or Convolutional Neural Network–Based Deep Learning Models
Minyue Yin
Xiaolong Liang
Zilan Wang
Yijia Zhou
Yu He
Yuhan Xue
Jingwen Gao
Jiaxi Lin
Chenyan Yu
Lu Liu
Xiaolin Liu
Chao Xu
Jinzhou Zhu
Journal of Digital Imaging, 2023, 36 : 827 - 836
[8] Identification of Asymptomatic COVID-19 Patients on Chest CT Images Using Transformer-Based or Convolutional Neural Network-Based Deep Learning Models
Yin, Minyue
Liang, Xiaolong
Wang, Zilan
Zhou, Yijia
He, Yu
Xue, Yuhan
Gao, Jingwen
Lin, Jiaxi
Yu, Chenyan
Liu, Lu
Liu, Xiaolin
Xu, Chao
Zhu, Jinzhou
JOURNAL OF DIGITAL IMAGING, 2023, 36 (03) : 827 - 836

← 1 →