Use of morphological analysis in protein name recognition

被引:8
|
作者
Yamamoto, K
Kudo, T
Konagaya, A
Matsumoto, Y
机构
[1] RIKEN, Inst Phys & Chem Res, Genom Sci Ctr, Bioinformat Grp, Wako, Saitama, Japan
[2] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
protein name recognition; named entity recognition; morphological analysis; tokenization and part-of-speech ambiguity; changing nomenclature in biomedicine; SVMs (kernel method) and feature engineering;
D O I
10.1016/j.jbi.2004.08.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:471 / 482
页数:12
相关论文
共 50 条
  • [21] What makes a gene name? Named entity recognition in the biomedical literature
    Leser, U
    Hakenberg, J
    BRIEFINGS IN BIOINFORMATICS, 2005, 6 (04) : 357 - 369
  • [22] Morphological Description of Cervical Cell Images for the Pathological Recognition
    Lassouaoui, N.
    Hamami, L.
    Nouali, N.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 5, 2005, 5 : 49 - 52
  • [23] Development of a land use extraction expert system through morphological and spatial arrangement analysis
    Beykaei, Seyed Ahad
    Zhong, Ming
    Zhang, Yun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 37 : 221 - 235
  • [24] Learning the Morphological and Syntactic Grammars for Named Entity Recognition
    Sun, Mengtao
    Yang, Qiang
    Wang, Hao
    Pasquine, Mark
    Hameed, Ibrahim A.
    INFORMATION, 2022, 13 (02)
  • [25] Exploiting the contextual cues for bio-entity name recognition in biomedical literature
    Yang, Zhihao
    Lin, Hongfei
    Li, Yanpeng
    JOURNAL OF BIOMEDICAL INFORMATICS, 2008, 41 (04) : 580 - 587
  • [26] Multilingual Name Entity Recognition and Intent Classification employing Deep Learning architectures
    Rizou, S.
    Paflioti, A.
    Theofilatos, A.
    Vakali, A.
    Sarigiannidis, G.
    Chatzisavvas, K. Ch.
    SIMULATION MODELLING PRACTICE AND THEORY, 2022, 120
  • [27] Employing Auto-annotated Data for Person Name Recognition in Judgment Documents
    Wang, Limin
    Yan, Qian
    Li, Shoushan
    Zhou, Guodong
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 13 - 23
  • [28] Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation
    Lee, G
    Lee, JH
    Yoo, J
    PATTERN RECOGNITION, 1997, 30 (08) : 1347 - 1360
  • [29] A morphological analysis of Brexitism
    Marlow-Stevens, Samuel
    NEW POLITICAL ECONOMY, 2023, 28 (04) : 539 - 553
  • [30] Named entity recognition between morphological analysis and Part-of-Speech Tagging: A automatic theory based approach
    Didakowski, Joerg
    Geyken, Alexander
    Hanneforth, Thomas
    ZEITSCHRIFT FUR SPRACHWISSENSCHAFT, 2007, 26 (02): : 157 - 186