Deep learning with word embeddings improves biomedical named entity recognition

被引:339
作者
Habibi, Maryam [1 ]
Weber, Leon [1 ]
Neves, Mariana [2 ]
Wiegandt, David Luis [1 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Dept Comp Sci, D-10099 Berlin, Germany
[2] Hasso Plattner Inst, Enterprise Platform & Integrat Concepts, D-14482 Potsdam, Germany
关键词
D O I
10.1093/bioinformatics/btx228
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall.
引用
收藏
页码:I37 / I48
页数:12
相关论文
共 65 条
[61]   2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text [J].
Uzuner, Oezlem ;
South, Brett R. ;
Shen, Shuying ;
DuVall, Scott L. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) :552-556
[62]   Disambiguating the species of biomedical named entities using natural language parsers [J].
Wang, Xinglong ;
Tsujii, Jun'ichi ;
Ananiadou, Sophia .
BIOINFORMATICS, 2010, 26 (05) :661-667
[63]   Rational drug repositioning by medical genetics [J].
Wang, Zhong-Yi ;
Zhang, Hong-Yu .
NATURE BIOTECHNOLOGY, 2013, 31 (12) :1080-+
[64]  
Wei CH, 2015, Proceedings of the fifth BioCreative challenge evaluation workshop, P154
[65]   Human symptoms-disease network [J].
Zhou, XueZhong ;
Menche, Joerg ;
Barabasi, Albert-Laszlo ;
Sharma, Amitabh .
NATURE COMMUNICATIONS, 2014, 5