Chemlistem: chemical named entity recognition using recurrent neural networks

被引:30
作者
Corbett, Peter [1 ]
Boyle, John [1 ]
机构
[1] Royal Soc Chem, Technol Dept, Data Sci Grp, Cambridge, England
来源
JOURNAL OF CHEMINFORMATICS | 2018年 / 10卷
关键词
Chemicals; Named entity recognition; Deep learning;
D O I
10.1186/s13321-018-0313-8
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as deep learning we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networksa type of recurrent neural net. The second system eschews the rich feature setand even tokenisationin favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).
引用
收藏
页数:9
相关论文
共 25 条
[1]  
[Anonymous], 2017, P BIOCREATIVE 5 5 CH
[2]  
[Anonymous], 2015, KERAS
[3]  
Chrupaa G, 2013, WORKSH DEEP LEARN AU
[4]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[5]   Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings [J].
Corbett, P. ;
Boyle, J. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
[6]  
Corbett P., 2007, Biological, Translational, and Clinical Language Processing, P57
[7]   Chemlistem: chemical named entity recognition using recurrent neural networks [J].
Corbett, Peter ;
Boyle, John .
JOURNAL OF CHEMINFORMATICS, 2018, 10
[8]   Cascaded classifiers for confidence-based chemical named entity recognition [J].
Corbett, Peter ;
Copestake, Ann .
BMC BIOINFORMATICS, 2008, 9 (Suppl 11)
[9]   ChEBI:: a database and ontology for chemical entities of biological interest [J].
Degtyarenko, Kirill ;
de Matos, Paula ;
Ennis, Marcus ;
Hastings, Janna ;
Zbinden, Martin ;
McNaught, Alan ;
Alcantara, Rafael ;
Darsow, Michael ;
Guedj, Mickael ;
Ashburner, Michael .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D344-D350
[10]  
Huang Z, 2015, ARXIV150801991