An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition

被引:328
作者
Luo, Ling [1 ]
Yang, Zhihao [1 ]
Yang, Pei [1 ]
Zhang, Yin [2 ]
Wang, Lei [2 ]
Lin, Hongfei [1 ]
Wang, Jian [1 ]
机构
[1] Dalian Univ Technol, Coll Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Beijing Inst Hlth Adm & Med Informat, Beijing 100850, Peoples R China
基金
中国国家自然科学基金;
关键词
CHEMDNER; DATABASE; SYSTEM; DRUGS;
D O I
10.1093/bioinformatics/btx761
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In biomedical research, chemical is an important class of entities, and chemical named entity recognition (NER) is an important task in the field of biomedical information extraction. However, most popular chemical NER methods are based on traditional machine learning and their performances are heavily dependent on the feature engineering. Moreover, these methods are sentence-level ones which have the tagging inconsistency problem. Results: In this paper, we propose a neural network approach, i.e. attention-based bidirectional Long Short-Term Memory with a conditional random field layer (Att-BiLSTM-CRF), to document-level chemical NER. The approach leverages document-level global information obtained by attention mechanism to enforce tagging consistency across multiple instances of the same token in a document. It achieves better performances with little feature engineering than other state-of-the-art methods on the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus and the BioCreative V chemical-disease relation (CDR) task corpus (the F-scores of 91.14 and 92.57%, respectively).
引用
收藏
页码:1381 / 1388
页数:8
相关论文
共 36 条
[1]  
[Anonymous], 2005, P 43 ANN M ASS COMP, DOI DOI 10.3115/1219840.1219885
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[4]  
Bharadwaj Akash., 2016, P EMNLP, P1462, DOI DOI 10.18653/V1/D16-1153
[5]  
Bottou L., 1991, P NEURO NIMES, V91
[6]  
Chalapathy R., 2016, Proceedings of the International Workshop on Health Text Mining and Information Analysis, P1, DOI [DOI 10.1017/S0031182015001766, 10.18653/v1/W16-6101, DOI 10.18653/V1/W16-6101]
[7]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[8]   Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks [J].
Davis, Allan Peter ;
Murphy, Cynthia G. ;
Saraceni-Richards, Cynthia A. ;
Rosenstein, Michael C. ;
Wiegers, Thomas C. ;
Mattingly, Carolyn J. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D786-D792
[9]   ChEBI:: a database and ontology for chemical entities of biological interest [J].
Degtyarenko, Kirill ;
de Matos, Paula ;
Ennis, Marcus ;
Hastings, Janna ;
Zbinden, Martin ;
McNaught, Alan ;
Alcantara, Rafael ;
Darsow, Michael ;
Guedj, Mickael ;
Ashburner, Michael .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D344-D350
[10]   Chemical named entities recognition: a review on approaches and applications [J].
Eltyeb, Safaa ;
Salim, Naomie .
JOURNAL OF CHEMINFORMATICS, 2014, 6