Use of morphological analysis in protein name recognition

被引:8
|
作者
Yamamoto, K
Kudo, T
Konagaya, A
Matsumoto, Y
机构
[1] RIKEN, Inst Phys & Chem Res, Genom Sci Ctr, Bioinformat Grp, Wako, Saitama, Japan
[2] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
protein name recognition; named entity recognition; morphological analysis; tokenization and part-of-speech ambiguity; changing nomenclature in biomedicine; SVMs (kernel method) and feature engineering;
D O I
10.1016/j.jbi.2004.08.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:471 / 482
页数:12
相关论文
共 50 条
  • [41] Digging for Names in the Mountains: Combined Person Name Recognition and Reference Resolution for German Alpine Texts
    Ebling, Sarah
    Sennrich, Rico
    Klaper, David
    HUMAN LANGUAGE TECHNOLOGY CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2014, 8387 : 189 - 200
  • [42] Morphological analysis of the corpus of spontaneous Japanese
    Uchimoto, K
    Takaoka, K
    Nobata, C
    Yamada, A
    Sekine, S
    Isahara, H
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 382 - 390
  • [43] Morphological analysis of forest tractor assemblies
    Susnjar, Marijan
    Horvat, Dubravko
    Kristic, Andrija
    Pandur, Zdravko
    CROATIAN JOURNAL OF FOREST ENGINEERING, 2008, 29 (01) : 41 - 51
  • [44] Parallel hardware for faster morphological analysis
    Damaj, Issam
    Imdoukh, Mahmoud
    Zantout, Rached
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2018, 30 (04) : 531 - 546
  • [45] The Module of Morphological and Syntactic Analysis SMART
    Leontyeva, Anastasia
    Kagirov, Ildar
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 373 - 380
  • [46] Bangladeshi License Plate Detection and Recognition with Morphological Operation and Convolution Neural Network
    Rabbani, Golam
    Islam, Mohammad Aminul
    Azim, Muhammad Anwarul
    Islam, Mohammad Khairul
    Rahman, Md Mostafizur
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [47] Automatic rule learning exploiting morphological features for named entity recognition in Turkish
    Tatar, Serhan
    Cicekli, Ilyas
    JOURNAL OF INFORMATION SCIENCE, 2011, 37 (02) : 137 - 151
  • [48] MORPHOLOGICAL ANALYSIS OF A SUSTAINABLE SCHOOL DESIGN
    Zeiler, Wim
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 11): IMPACTING SOCIETY THROUGH ENGINEERING DESIGN, VOL 1: DESIGN PROCESSES, 2011, 1 : 35 - 44
  • [49] Morphological Analysis for Breast Cancer Detection
    Velusamy, Priya Darshini
    Karandharaj, Porkumaran
    Prabakar, S.
    PROCEEDINGS OF 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS, 2017, 467 : 197 - 208
  • [50] A stacked sequential learning method for investigator name recognition from Web-based medical articles
    Zhang, Xiaoli
    Zou, Jie
    Le, Daniel X.
    Thoma, George
    DOCUMENT RECOGNITION AND RETRIEVAL XVII, 2010, 7534