Use of morphological analysis in protein name recognition

被引:8
|
作者
Yamamoto, K
Kudo, T
Konagaya, A
Matsumoto, Y
机构
[1] RIKEN, Inst Phys & Chem Res, Genom Sci Ctr, Bioinformat Grp, Wako, Saitama, Japan
[2] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
protein name recognition; named entity recognition; morphological analysis; tokenization and part-of-speech ambiguity; changing nomenclature in biomedicine; SVMs (kernel method) and feature engineering;
D O I
10.1016/j.jbi.2004.08.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:471 / 482
页数:12
相关论文
共 50 条
  • [1] ON THE USE OF MORPHOLOGICAL ANALYSIS FOR DIALECTAL ARABIC SPEECH RECOGNITION
    Afify, Mohamed
    Sarikaya, Ruhi
    Kuo, Hong-Kwang Jeff
    Besacier, Laurent
    Gao, Yuqing
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 277 - 280
  • [2] Flat and Nested Protein Name Recognition Based on BioBERT and Biaffine Decoder
    Tang, Zhan
    Kou, Xupeng
    Xue, Hongcheng
    Xia, Yuantian
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 25 - 38
  • [3] Protein Name Recognition Based on Dictionary Mining and Heuristics
    Lin, Shian-Hua
    Ding, Shao-Hong
    Zeng, Wei-Sheng
    ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, AAIM 2014, 2014, 8546 : 75 - 87
  • [4] Morphological analysis of the 'context of use' Application of morphological analysis to the collaborative understanding of the context of use
    Winter, Dominique
    Hausmann, Carolin
    Schoen, Eva-Maria
    Thomaschewski, Joerg
    2020 15TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2020), 2020,
  • [5] Improving the performance of dictionary-based approaches in protein name recognition
    Tsuruoka, Y
    Tsujii, J
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) : 461 - 470
  • [6] Effective integration of morphological analysis and named entity recognition based on a recurrent neural network
    Lee, Hyeon-gu
    Park, Geonwoo
    Kim, Harksoo
    PATTERN RECOGNITION LETTERS, 2018, 112 : 361 - 365
  • [7] Integrated Model for Morphological Analysis and Named Entity Recognition Based on Label Attention Networks in Korean
    Kim, Hongjin
    Kim, Harksoo
    APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [8] PERSONAL NAME AND LOCATION NAME RECOGNITION BASED ON CONDITIONAL RANDOM FIELDS
    Zhang, Su-Xiang
    Gao, Guo-Yang
    Qi, Yin-Cheng
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 2255 - 2259
  • [9] Algorithms of the Cluster and Morphological Analysis for Mineral Rocks Recognition in the Mining Industry
    Baklanova, Olga E.
    Baklanov, Mikhail A.
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 268 - 278
  • [10] Rule Based Product Name Recognition and Disambiguation
    Godeny, Balazs
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 858 - 860