Named Entity Recognition for Biomedical Patent Text using Bi-LSTM Variants

被引:3
作者
Saad, Farag [1 ]
机构
[1] FIZ Karlsruhe, Leibniz Inst Informat Infrastruct, Eggenstein Leopoldshafen, Baden Wurttembe, Germany
来源
IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES | 2019年
关键词
NER; Biomedical; Deep Learning; Neural Network; LSTM; Bi-LSTM;
D O I
10.1145/3366030.3366104
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent years have shown a substantial increase in biomedical publications (patents or scientific articles) that are multiplying at a daily pace. This has led to an increased interest in the extraction of meaningful information (e.g., named entities) from these publications. Traditional NER approaches demand a considerable level of engineering skills and domain expertise in designing rules and features for better algorithm accuracy. In addition, due to the structure and linguistic complexity of the patent text, constructing such rules and features is often a challenging task. In this paper, we investigate various variants of the Bi-LSTM model performance for NER task based on features generated automatically from an unlabelled genes and proteins patent corpora. The proposed model is able to capture the context representation of an input sequence and globally assign the related labels for each token. The CHARS-Bi-LSTM-EMA variant yielded the best performance and significantly outperformed the state-of-the art approach.
引用
收藏
页码:617 / 621
页数:5
相关论文
共 17 条
  • [1] Automatic identification of relevant chemical compounds from patents
    Akhondi, Saber A.
    Rey, Hinnerk
    Schwoerer, Markus
    Maier, Michael
    Toomey, John
    Nau, Heike
    Ilchmann, Gabriele
    Sheehan, Mark
    Irmer, Matthias
    Bobach, Claudia
    Doornenbal, Marius
    Gregory, Michelle
    Kors, Jan A.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2019,
  • [2] Entity recognition in the biomedical domain using a hybrid approach
    Basaldella, Marco
    Furrer, Lenz
    Tasso, Carlo
    Rinaldi, Fabio
    [J]. JOURNAL OF BIOMEDICAL SEMANTICS, 2017, 8
  • [3] Collobert R., 2008, P 25 INT C MACHINE L, P160, DOI [10.1145/1390156.1390177, DOI 10.1145/1390156.1390177]
  • [4] Collobert R, 2011, J MACH LEARN RES, V12, P2493
  • [5] Han X, 2016, IEEE IJCNN, P1253, DOI 10.1109/IJCNN.2016.7727341
  • [6] ProMiner: rule-based protein and gene entity recognition
    Hanisch, D
    Fundel, K
    Mevissen, HT
    Zimmer, R
    Fluck, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [7] Hochreiter S, 1997, Neural Computation, V9, P1735
  • [8] Ju Meizhi, 2018, P 2018 C N AM ASS CO
  • [9] Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data
    Muresan, Sorel
    Petrov, Plamen
    Southan, Christopher
    Kjellberg, Magnus J.
    Koger, Thierry
    Tyrchan, Christian
    Varkonyi, Peter
    Xie, Paul Hongxing
    [J]. DRUG DISCOVERY TODAY, 2011, 16 (23-24) : 1019 - 1030
  • [10] Overview of prior-art cross-lingual information retrieval approaches
    Saad, Farag
    Nuernberger, Andreas
    [J]. WORLD PATENT INFORMATION, 2012, 34 (04) : 304 - 314