Use of morphological analysis in protein name recognition

被引:8
|
作者
Yamamoto, K
Kudo, T
Konagaya, A
Matsumoto, Y
机构
[1] RIKEN, Inst Phys & Chem Res, Genom Sci Ctr, Bioinformat Grp, Wako, Saitama, Japan
[2] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
protein name recognition; named entity recognition; morphological analysis; tokenization and part-of-speech ambiguity; changing nomenclature in biomedicine; SVMs (kernel method) and feature engineering;
D O I
10.1016/j.jbi.2004.08.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:471 / 482
页数:12
相关论文
共 50 条
  • [31] A Tool to Extract Name Entity Recognition From Big Data in Banking Sectors
    Saju, C. Janarish
    Ravimaran, S.
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2020, 17 (02) : 18 - 39
  • [32] MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic
    Pasha, Arfath
    Al-Badrashiny, Mohamed
    Diab, Mona
    El Kholy, Ahmed
    Eskander, Ramy
    Habash, Nizar
    Pooleery, Manoj
    Rambow, Owen
    Roth, Ryan M.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1094 - 1101
  • [33] Transformer based named entity recognition for place name extraction from unstructured text
    Berragan, Cillian
    Singleton, Alex
    Calafiore, Alessia
    Morley, Jeremy
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2023, 37 (04) : 747 - 766
  • [34] Enhancing performance of protein and gene name recognizers with filtering and integration strategies
    Hou, WJ
    Chen, HH
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) : 448 - 460
  • [35] MORPHOLOGICAL ANALYSIS OF THE SLOVAK LANGUAGE
    Hladek, Daniel
    Stas, Jan
    Juhar, Jozef
    ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2015, 13 (04) : 289 - 294
  • [36] Morphological Analysis in Inventive Engineering
    Arciszewski, Tomasz
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2018, 126 : 92 - 101
  • [37] SIMULATION TOOL FOR MORPHOLOGICAL ANALYSIS
    Fronville, Alexandra
    Harrouet, Fabrice
    Desilles, Anya
    Deloor, Pierre
    EUROPEAN SIMULATION AND MODELLING CONFERENCE 2010, 2010, : 127 - +
  • [38] Fast Morphological Analysis of Czech
    Smerk, Pavel
    RASLAN 2009: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2009, : 13 - 16
  • [39] On the Use of Parsing for Named Entity Recognition
    Alonso, Miguel A.
    Gomez-Rodriguez, Carlos
    Vilares, Jesus
    APPLIED SCIENCES-BASEL, 2021, 11 (03): : 1 - 24
  • [40] People name recognition from ancient Chinese literature using distant supervision and deep learning
    Zhang, Hailin
    Zhu, Hai
    Ruan, Junsong
    Ding, Ruoyao
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,