Named Entity Recognition From Biomedical Data

被引:0
作者
Refaat, Maged [1 ]
Rafea, Ahmed [1 ]
Gaballah, Nada [1 ]
机构
[1] Amer Univ Cairo AUC, Comp Sci & Engn Dept, Cairo, Egypt
来源
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023 | 2023年
关键词
NER; named entity recognition; chemprot; drugprot; BERT; SciBERT; BioBERT;
D O I
10.1109/CSCI62032.2023.00141
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vast amounts of textual data are now available digitally. Consequently, automated tools are needed to extract relevant meaningful information. Named entity recognition (NER) is the task of identifying text referring to named entities, and classifying them into predefined categories. Although there exist numerous NER methods, biomedical domain name entity recognition is under-studied. The objective of this research paper is to introduce an efficient approach for NER tasks from biomedical data. The investigated approach uses pre-trained models like BERT and its variances SciBERT and BioBERT, and deep learning technologies. Our hypothesis is that applying the training phase on textual data after being preprocessed will enhance the model performance. Therefore, we will investigate the effect of adding basic preprocessing rules like dropping out punctuation and white spaces and well-known stop words like articles, pronouns, prepositions, and conjunctions. We will also investigate removing irrelevant parts of text based on part-of-speech tagging such as verbs and adjectives. The effect of text preprocessing on model performance will be monitored. Our model outperforms vanilla BERT, and BioBERT where Precision reaches 66.20%, Recall reached 98.96%, F1 scored reached 79.33%.
引用
收藏
页码:838 / 844
页数:7
相关论文
共 28 条
[1]  
Ammar Waleed, 2018, NAACL-HLT, V3, P84, DOI 10.18653/v1/n18-3011
[2]  
[Anonymous], About us
[3]  
Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[4]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[5]  
Devlin J., ARXIV
[6]   Unsupervised named-entity extraction from the Web: An experimental study [J].
Etzioni, O ;
Cafarella, M ;
Downey, D ;
Popescu, AM ;
Shaked, T ;
Soderland, S ;
Weld, DS ;
Yates, A .
ARTIFICIAL INTELLIGENCE, 2005, 165 (01) :91-134
[7]   Recent Named Entity Recognition and Classification techniques: A systematic review [J].
Goyal, Archana ;
Gupta, Vishal ;
Kumar, Manish .
COMPUTER SCIENCE REVIEW, 2018, 29 :21-43
[8]  
Grishman R., 1997, Information Extraction. A Multidisciplinary Approach to an Emerging Information Technology International Summer School, SCIE-97, P10
[9]  
Grishman R., MESSAGE UNDERSTANDIN, P6
[10]   Deep learning with word embeddings improves biomedical named entity recognition [J].
Habibi, Maryam ;
Weber, Leon ;
Neves, Mariana ;
Wiegandt, David Luis ;
Leser, Ulf .
BIOINFORMATICS, 2017, 33 (14) :I37-I48