Named Entity Recognition From Biomedical Data

被引:0
作者
Refaat, Maged [1 ]
Rafea, Ahmed [1 ]
Gaballah, Nada [1 ]
机构
[1] Amer Univ Cairo AUC, Comp Sci & Engn Dept, Cairo, Egypt
来源
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023 | 2023年
关键词
NER; named entity recognition; chemprot; drugprot; BERT; SciBERT; BioBERT;
D O I
10.1109/CSCI62032.2023.00141
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vast amounts of textual data are now available digitally. Consequently, automated tools are needed to extract relevant meaningful information. Named entity recognition (NER) is the task of identifying text referring to named entities, and classifying them into predefined categories. Although there exist numerous NER methods, biomedical domain name entity recognition is under-studied. The objective of this research paper is to introduce an efficient approach for NER tasks from biomedical data. The investigated approach uses pre-trained models like BERT and its variances SciBERT and BioBERT, and deep learning technologies. Our hypothesis is that applying the training phase on textual data after being preprocessed will enhance the model performance. Therefore, we will investigate the effect of adding basic preprocessing rules like dropping out punctuation and white spaces and well-known stop words like articles, pronouns, prepositions, and conjunctions. We will also investigate removing irrelevant parts of text based on part-of-speech tagging such as verbs and adjectives. The effect of text preprocessing on model performance will be monitored. Our model outperforms vanilla BERT, and BioBERT where Precision reaches 66.20%, Recall reached 98.96%, F1 scored reached 79.33%.
引用
收藏
页码:838 / 844
页数:7
相关论文
共 28 条
[11]   ProMiner: rule-based protein and gene entity recognition [J].
Hanisch, D ;
Fundel, K ;
Mevissen, HT ;
Zimmer, R ;
Fluck, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[12]  
Huang ZH, 2015, Arxiv, DOI arXiv:1508.01991
[13]  
Krallinger Martin, 2021, Zenodo, DOI 10.5281/ZENODO.4955410
[14]  
Lample G, 2016, Arxiv, DOI [arXiv:1603.01360, DOI 10.48550/ARXIV.1603.01360]
[15]   Deep learning [J].
LeCun, Yann ;
Bengio, Yoshua ;
Hinton, Geoffrey .
NATURE, 2015, 521 (7553) :436-444
[16]   BioBERT: a pre-trained biomedical language representation model for biomedical text mining [J].
Lee, Jinhyuk ;
Yoon, Wonjin ;
Kim, Sungdong ;
Kim, Donghyeon ;
Kim, Sunkyu ;
So, Chan Ho ;
Kang, Jaewoo .
BIOINFORMATICS, 2020, 36 (04) :1234-1240
[17]   A Survey on Deep Learning for Named Entity Recognition [J].
Li, Jing ;
Sun, Aixin ;
Han, Jianglei ;
Li, Chenliang .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) :50-70
[18]   A Novel Technique for Name Identification from Homeopathy Diagnosis Discussion Forum [J].
Majumder, Mukta ;
Barman, Utsav ;
Prasad, Rahul ;
Saurabh, Kumar ;
Saha, Sujan Kumar .
2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 :379-386
[19]  
Nadeau D, 2007, LINGUIST INVESTIG, V30, P3
[20]  
Nobata C., Summarization System Integrated with Named Entity Tagging and IF pattern Discovery, P4