A text mining approach to detect mentions of protein glycosylation in biomedical text

被引:0
作者
Shukla, Daksha [1 ]
Jayaraman, Valadi K. [2 ]
机构
[1] Univ Pune, Bioinformat Ctr, Pune, Maharashtra, India
[2] Univ Pune, Ctr Dev Adv Comp, Pune, Maharashtra, India
关键词
Text mining; Glycosylation; Rule-based approach; Dictionary -based approach;
D O I
10.6026/97320630008758
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein Glycosylation is an important post translational event that plays a pivotal role in protein folding and protein is trafficking. We describe a dictionary based and a rule based approach to mine 'mentions' of protein glycosylation in text. The dictionary based approach relies on a set of manually curated dictionaries specially constructed to address this task. Abstracts are then screened for the 'mentions' of words from these dictionaries which are further scored followed by classification on the basis of a threshold. The rule based approaches also relies on the words in the dictionary to arrive at the features which are used for classification. The performance of the system using both the approaches has been evaluated using a manually curated corpus of 3133 abstracts. The evaluation suggests that the performance of the Rule based approach supersedes that of the Dictionary based approach.
引用
收藏
页码:758 / 762
页数:5
相关论文
共 8 条
  • [1] Text mining for biology - the way forward: opinions from leading scientists
    Altman, Russ B.
    Bergman, Casey M.
    Blake, Judith
    Blaschke, Christian
    Cohen, Aaron
    Gannon, Frank
    Grivell, Les
    Hahn, Udo
    Hersh, William
    Hirschman, Lynette
    Jensen, Lars Juhl
    Krallinger, Martin
    Mons, Barend
    O'Donoghue, Sean I.
    Peitsch, Manuel C.
    Rebholz-Schuhmann, Dietrich
    Shatkay, Hagit
    Valencia, Alfonso
    [J]. GENOME BIOLOGY, 2008, 9
  • [2] Evaluation of BioCreAtIvE assessment of task 2
    Blaschke, Christian
    Leon, Eduardo Andres
    Krallinger, Martin
    Valencia, Alfonso
    [J]. BMC Bioinformatics, 2005, 6 (SUPPL.1)
  • [3] BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009
    Chang, Antje
    Scheer, Maurice
    Grote, Andreas
    Schomburg, Ida
    Schomburg, Dietmar
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D588 - D592
  • [4] A survey of current work in biomedical text mining
    Cohen, AM
    Hersh, WR
    [J]. BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) : 57 - 71
  • [5] Dowell K. G., 2009, DATABASE-OXFORD, V2009, DOI DOI 10.1093/DATABASE/BAP019
  • [6] PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction
    Krallinger, Martin
    Rodriguez-Penagos, Carlos
    Tendulkar, Ashish
    Valencia, Alfonso
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : W160 - W165
  • [7] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [8] Tagging gene and protein names in biomedical text
    Tanabe, L
    Wilbur, WJ
    [J]. BIOINFORMATICS, 2002, 18 (08) : 1124 - 1132