Deep learning with word embeddings improves biomedical named entity recognition

被引:335
|
作者
Habibi, Maryam [1 ]
Weber, Leon [1 ]
Neves, Mariana [2 ]
Wiegandt, David Luis [1 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Dept Comp Sci, D-10099 Berlin, Germany
[2] Hasso Plattner Inst, Enterprise Platform & Integrat Concepts, D-14482 Potsdam, Germany
关键词
D O I
10.1093/bioinformatics/btx228
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall.
引用
收藏
页码:I37 / I48
页数:12
相关论文
共 50 条
  • [1] Comparing general and specialized word embeddings for biomedical named entity recognition
    Ramos-Vargas, Rigo E.
    Roman-Godinez, Israel
    Torres-Ramos, Sulema
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 22
  • [2] Deep Learning with Word Embedding Improves Kazakh Named-Entity Recognition
    Haisa, Gulizada
    Altenbek, Gulila
    INFORMATION, 2022, 13 (04)
  • [3] A deep neural framework for named entity recognition with boosted word embeddings
    Goyal, Archana
    Gupta, Vishal
    Kumar, Manish
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (06) : 15533 - 15546
  • [4] A deep neural framework for named entity recognition with boosted word embeddings
    Archana Goyal
    Vishal Gupta
    Manish Kumar
    Multimedia Tools and Applications, 2024, 83 : 15533 - 15546
  • [5] Deep recurrent neural networks with word embeddings for Urdu named entity recognition
    Khan, Wahab
    Daud, Ali
    Alotaibi, Fahd
    Aljohani, Naif
    Arafat, Sachi
    ETRI JOURNAL, 2020, 42 (01) : 90 - 100
  • [6] Named Entity Recognition Only from Word Embeddings
    Luo, Ying
    Zhao, Hai
    Zhan, Junlang
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8995 - 9005
  • [7] Combining Word Embeddings for Portuguese Named Entity Recognition
    da Silva, Messias Gomes
    Alves de Oliveira, Hilario Tomaz
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 198 - 208
  • [8] LM-Based Word Embeddings Improve Biomedical Named Entity Recognition: A Detailed Analysis
    Akhtyamova, Liliya
    Cardiff, John
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2020), 2020, 12108 : 624 - 635
  • [9] Shahmukhi named entity recognition by using contextualized word embeddings
    Tehseen, Amina
    Ehsan, Toqeer
    Bin Liaqat, Hannan
    Kong, Xiangjie
    Ali, Amjad
    Al-Fuqaha, Ala
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [10] A Deep Learning-Based Named Entity Recognition in Biomedical Domain
    Gopalakrishnan, Athira
    Soman, K. P.
    Premjith, B.
    EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 517 - 526